From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <me@serhei.io>
Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com
 [64.147.123.24])
 by sourceware.org (Postfix) with ESMTPS id 9F6CD385E036
 for <bunsen@sourceware.org>; Thu, 19 Aug 2021 12:41:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F6CD385E036
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=serhei.io
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=serhei.io
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42])
 by mailout.west.internal (Postfix) with ESMTP id B117E32009EB;
 Thu, 19 Aug 2021 08:41:26 -0400 (EDT)
Received: from imap21 ([10.202.2.71])
 by compute2.internal (MEProxy); Thu, 19 Aug 2021 08:41:26 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=serhei.io; h=
 mime-version:message-id:in-reply-to:references:date:from:to
 :subject:content-type; s=fm2; bh=cLfQTFIKZdKZXCEGkzbPczhJBLW73so
 WuOsw1yYllDA=; b=k+1gNCMWtiM1253ru2PsOst0H+n5Jxdx8KkULu4nroSsXPj
 K6uwrtmzzT5d/y/94XSxDU1nQsCvzKbqJ5V4Rki8zcIOD7R56DH6pdPkyb8mhWlQ
 DSNwSoEYRIk9mhN/74PDR4eIJWkT3gRWMSHy+FBu9DpfVjZvt2AMQLa127JDwjse
 nDamXy6URRD5CGD7JSJf4iIn6yQTYzADzffRJQGUeJ6HvVbACniOvoKdH/auapjC
 89CMxEgaxCRTTG6bbn5UE74RUyAZB4OVq4WtP0yaH58VLujefkauvBt1X1v+jb7o
 b48MWv8g2OEeLnmFWUT6kEAKx4BRSN1rkuGXHFQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=content-type:date:from:in-reply-to
 :message-id:mime-version:references:subject:to:x-me-proxy
 :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=cLfQTF
 IKZdKZXCEGkzbPczhJBLW73soWuOsw1yYllDA=; b=AVMcZRGSfRAtg6Pg0MpqwF
 sPiYo5qNrKcPID+uwq0OYJZqjcrihUYbOG9SNRkjDHfvZXSRruMdYVK/zaO4c3gS
 s4xuh/9pl6BONZXB4R/FewyzgI+sT8n7s/HJoDd9OGFEiRr3elXHWm7o4kqF6NLN
 z7prfMV90zyvZikRjMtA+nzwF64D3t5HTHDiIJm82nV28n2013viTT5r4cDobPZs
 mABrFowaarOLekR9FXJ+8h1gJBdK53Gy3orEgED8y/ZoxyCypU8Kao4IegaxnSVO
 ncqQaNMcI9/YduWs8S4jet+v8sgiQz/6yibTgTvQ5MZQD8h0spDUK3sNgiU/lNuA
 ==
X-ME-Sender: <xms:dVEeYajGriJfejFmyIeaycxJlI-rCfJQcdvlvTDu3WaYwtEpZ7F6Xw>
 <xme:dVEeYbChu-X3BpS7_1Pl11ToALwUECOqusSYeU0b71ymYZy0c8tvxi17uaAVWrAdc
 4GRs7eVi3NuFYJkHw>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrleejgdehgecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedfufgvrhhh
 vghiucforghkrghrohhvfdcuoehmvgesshgvrhhhvghirdhioheqnecuggftrfgrthhtvg
 hrnhepgeduledugfehffdtueejudekheehudehudfffefhjeehledugfevgffhffehgfej
 necuffhomhgrihhnpehsvghrhhgvihdrihhonecuvehluhhsthgvrhfuihiivgeptdenuc
 frrghrrghmpehmrghilhhfrhhomhepmhgvsehsvghrhhgvihdrihho
X-ME-Proxy: <xmx:dVEeYSFQyAkyPfWEwhJfa9K7OPIfuaS-g527zaGj0fxBgnlXquyWHg>
 <xmx:dVEeYTQwl-Men-lZvE5QT-iB0UtCTKT9r5Q3aZepp5-3GMU6zh6-tA>
 <xmx:dVEeYXwC4oBKRYzMAb3B2-1Nul5YQLnkX5d40V-DMLDKHjiYYfcMrQ>
 <xmx:dlEeYYta0ERTuBWqfTJDCnMYIy3RUCzqCijqnrYgDJHgRFem2g08NQ>
Received: by mailuser.nyi.internal (Postfix, from userid 501)
 id 631A051C0061; Thu, 19 Aug 2021 08:41:25 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-1118-g75eff666e5-fm-20210816.002-g75eff666
Mime-Version: 1.0
Message-Id: <3ad2b380-d3f5-438d-bd38-3f16470a159a@www.fastmail.com>
In-Reply-To: <20210818192639.2362335-2-keiths@redhat.com>
References: <20210818192639.2362335-1-keiths@redhat.com>
 <20210818192639.2362335-2-keiths@redhat.com>
Date: Thu, 19 Aug 2021 08:40:55 -0400
From: "Serhei Makarov" <me@serhei.io>
To: "Keith Seitz" <keiths@redhat.com>, Bunsen <bunsen@sourceware.org>
Subject: Re: [PATCH 1/4] Rewrite gdb.parse_dejagnu_sum
Content-Type: text/plain
X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: bunsen@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Bunsen mailing list <bunsen.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/bunsen>,
 <mailto:bunsen-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/bunsen/>
List-Help: <mailto:bunsen-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/bunsen>,
 <mailto:bunsen-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Aug 2021 12:41:40 -0000

Hello Keith,

I had a look, the 4 patches look good to commit. I will do additional testing with the SystemTap data next week.

All the best,
      Serhei

On Wed, Aug 18, 2021, at 3:26 PM, Keith Seitz via Bunsen wrote:
> This patch rewrites gdb.parse_dejagnu_sum, making it significantly
> simple and more reliable. With the current version of this function,
> I have been consistently seeing 8,000+ "missing" tests -- tests
> that are recorded in gdb.sum but never make it into the Bunsen
> database.
> 
> After chasing down a number of problems, I found it was much easier
> to simply rewrite this function. Consequently, my Bunsen imports of
> gdb.sum now account for every test.
> ---
>  scripts-main/gdb/parse_dejagnu.py | 206 ++++++++++++++++--------------
>  1 file changed, 110 insertions(+), 96 deletions(-)
> 
> diff --git a/scripts-main/gdb/parse_dejagnu.py 
> b/scripts-main/gdb/parse_dejagnu.py
> index b56ed3c..fc6a30e 100755
> --- a/scripts-main/gdb/parse_dejagnu.py
> +++ b/scripts-main/gdb/parse_dejagnu.py
> @@ -71,116 +71,126 @@ def get_outcome_subtest(line):
>      if m is None: return None
>      return m.group('outcome'), m.group('subtest')
>  
> +# Normalize the test named NAME. NAME_DICT is used to track these.
> +#
> +# This is unfortunately quite complex.
> +
> +def normalize_name(name_dict, name):
> +
> +    assert(name is not None)
> +    assert(name != "")
> +
> +    # The buildbot does:
> +    #
> +    #  test_name = re.sub (r'(\s+)? \(.*\)$', r'', orig_name)
> +    #
> +    # But this is overly aggressive, causing thousands of duplicate
> +    # names to be recorded.
> +    #
> +    # Instead, try to remove known constant statuses. Unfortunately, this is
> +    # quite slow, but it is the most reliable way to avoid 10,000 duplicate
> +    # names from invading the database.
> +    test_name = re.sub(r' \((PRMS.*|timeout|eof|GDB internal error'
> +                       r'|the program exited|the program is no longer running'
> +                       r'|got interactive prompt|got breakpoint menu'
> +                       r'|resync count exceeded|bad file format|file not found'
> +                       r'|incomplete note section|unexpected output'
> +                       r'|inferior_not_stopped|stopped at wrong place'
> +                       r'|unknown output after running|dwarf version unhandled'
> +                       r'|line numbers scrambled\?|out of virtual memory'
> +                       r'|missing filename|missing module'
> +                       r'|missing /usr/bin/prelink)\)', r'', name)
> +
> +    if test_name in name_dict:
> +        # If the test is already present in the file list, then
> +        # we include a unique identifier in the end of it, in the
> +        # form or '<<N>>' (where N is a number >= 2).  This is
> +        # useful because the GDB testsuite is full of non-unique
> +        # test messages.
> +        i = 2
> +        while True:
> +            nname = test_name + ' <<' + str (i) + '>>'
> +            if nname not in name_dict:
> +                break
> +            i += 1
> +        test_name = nname
> +
> +    name_dict[test_name] = test_name
> +    return test_name
> +
>  def parse_dejagnu_sum(testrun, sumfile, all_cases=None,
>                        consolidate_pass=False, verbose=True):
> -#                      consolidate_pass=True, verbose=True):
>      if testrun is None: return None
>      f = openfile_or_xz(sumfile)
>  
>      last_exp = None
> -    last_test_passed = False # at least one pass and no fails
> -    last_test_failed = False # at least one fail
> -    failed_subtests = [] # XXX Better known as 'unpassed'?
> -    passed_subtests = []
> -    failed_subtests_summary = 0
> -    passed_subtests_summary = 0
> +    counts = dict()
> +    names = dict()
> +
> +    # The global test_outcome_map doesn't contain all of our
> +    # outcomes. Add those now.
> +    test_outcome_map['PATH'] = 'PATH'  # Tests with paths in their names
> +
> +    # Clear counts dictionary
> +    counts = dict.fromkeys(test_outcome_map, 0)
>  
> +    # Iterate over lines in the sum file.
>      for cur in Cursor(sumfile, path=os.path.basename(sumfile), input_stream=f):
>          line = cur.line
>  
> -        # XXX need to handle several .sum formats
> -        # buildbot format :: all lines are outcome lines, include the 
> .exp
> -        # regular format :: outcome lines separated by "Running 
> <expname>.exp ..."
> -        outcome, expname, subtest = None, None, None
> +        # There's always an exception.  ERRORs are not output the same
> +        # way as other test results.  They simply list a reason.
> +        # FIXME: ERRORs typically span a range of lines
>          info = get_expname_subtest(line)
> -        if info is not None:
> -            outcome, expname, subtest = info
> -        elif (line.startswith("Running") and ".exp ..." in line):
> -            outcome = None
> -            expname = get_running_exp(line)
> -        else:
> -            info = get_outcome_subtest(line)
> -            if info is not None:
> -                outcome, subtest = info
> -
> -        # XXX these situations mark an .exp boundary:
> -        finished_exp = False
> -        if expname != last_exp and expname is not None and last_exp is 
> not None:
> -            finished_exp = True
> -        elif "Summary ===" in line:
> -            finished_exp = True
> -
> -        if finished_exp:
> -            running_cur.line_end = cur.line_end-1
> -            if consolidate_pass and last_test_passed:
> -                testrun.add_testcase(name=last_exp,
> -                                     outcome='PASS',
> -                                     origin_sum=running_cur)
> -            elif last_test_passed:
> -                # Report each passed subtest individually:
> -                for passed_subtest, outcome, cursor in passed_subtests:
> -                    testrun.add_testcase(name=last_exp,
> -                                         outcome=outcome,
> -                                         subtest=passed_subtest,
> -                                         origin_sum=cursor)
> -            # Report all failed and untested subtests:
> -            for failed_subtest, outcome, cursor in failed_subtests:
> -                testrun.add_testcase(name=last_exp,
> -                                     outcome=outcome,
> -                                     subtest=failed_subtest,
> -                                     origin_sum=cursor)
> -
> -        if expname is not None and expname != last_exp:
> -            last_exp = expname
> -            running_cur = Cursor(start=cur)
> -            last_test_passed = False
> -            last_test_failed = False
> -            failed_subtests = []
> -            passed_subtests = []
> +        if info is None:
> +            if line.startswith('ERROR:'):
> +                # In this case, the "subtest" is actually the reason
> +                # for the failure. LAST_EXP is not necessarily strictly
> +                # correct, but we would have to watch for additional
> +                # messages (Running TESTFILE ...) to make this work 
> properly.
> +                # In practice, it's not typically a problem.
> +                info = ('ERROR', last_exp, line[len('ERROR: '):])
> +            elif line.endswith(".exp:\n"):
> +                # An unnamed test. It happens.
> +                line = line[:-1] + " " + "UNNAMED_TEST" + "\n"
> +                info = get_expname_subtest(line)
> +                if info is None:
> +                    # We tried. Nothing else we can do.
> +                    print("WARNING: unknown expname/subtest in outcome 
> line --", line, file=sys.stderr)
> +                    continue
> +            else:
> +                continue
>  
> -        if outcome is None:
> +        outcome, expname, subtest = info
> +
> +        # Warn and skip any outcome that is not in test_outcome_map!
> +        # It will cause an exception later.
> +        if outcome not in test_outcome_map:
> +            print(f'WARNING: unexpected test outcome ({outcome}) in 
> line -- {line}')
>              continue
> -        # XXX The line contains a test outcome.
> -        synth_line = line
> -        if all_cases is not None and expname is None:
> -            # XXX force embed the expname into the line for later 
> annotation code
> -            synth_line = str(outcome) + ": " + last_exp + ": " + 
> str(subtest)
> -        all_cases.append(synth_line)
> -
> -        # TODO: Handle other dejagnu outcomes if they show up:
> -        if line.startswith("FAIL: ") \
> -           or line.startswith("KFAIL: ") \
> -           or line.startswith("XFAIL: ") \
> -           or line.startswith("ERROR: tcl error sourcing"):
> -            last_test_failed = True
> -            last_test_passed = False
> -            failed_subtests.append((line,
> -                                    check_mapping(line, 
> test_outcome_map, start=True),
> -                                    cur)) # XXX single line
> -            failed_subtests_summary += 1
> -        if line.startswith("UNTESTED: ") \
> -           or line.startswith("UNSUPPORTED: ") \
> -           or line.startswith("UNRESOLVED: "):
> -            # don't update last_test_{passed,failed}
> -            failed_subtests.append((line,
> -                                    check_mapping(line, 
> test_outcome_map, start=True),
> -                                    cur))
> -            # don't tally
> -        if line.startswith("PASS: ") \
> -           or line.startswith("XPASS: ") \
> -           or line.startswith("IPASS: "):
> -            if not last_test_failed: # no fails so far
> -                last_test_passed = True
> -            if not consolidate_pass:
> -                passed_subtests.append((line,
> -                                        check_mapping(line, 
> test_outcome_map, start=True),
> -                                        cur))
> -            passed_subtests_summary += 1
> -    f.close()
> +        if last_exp != expname:
> +            last_exp = expname
> +            names.clear()
> +
> +        # Normalize the name to account for duplicates.
> +        subtest = normalize_name(names, subtest)
>  
> -    testrun.pass_count = passed_subtests_summary
> -    testrun.fail_count = failed_subtests_summary
> +        if all_cases is not None:
> +            # ERRORs are not appended to outcome_lines!
> +            if outcome != "ERROR":
> +                all_cases.append(line)
>  
> +        if consolidate_pass:
> +            pass # not implemented
> +        else:
> +            testrun.add_testcase(name=expname, outcome=outcome,
> +                                 subtest=subtest, origin_sum=cur)
> +            counts[outcome] += 1
> +    f.close()
> +
> +    testrun.pass_count = counts['PASS'] + counts['XPASS'] + counts['KPASS']
> +    testrun.fail_count = counts['FAIL'] + counts['XFAIL'] + counts['KFAIL'] \
> +        + counts['ERROR'] # UNTESTED, UNSUPPORTED, UNRESOLVED not tallied
>      return testrun
>  
>  def annotate_dejagnu_log(testrun, logfile, outcome_lines=[],
> @@ -218,7 +228,11 @@ def annotate_dejagnu_log(testrun, logfile, 
> outcome_lines=[],
>      # (1b) Build a map of outcome_lines:
>      testcase_line_start = {} # .exp name -> index of first 
> outcome_line with this name
>      for j in range(len(outcome_lines)):
> -        outcome, expname, subtest = 
> get_expname_subtest(outcome_lines[j])
> +        info = get_expname_subtest(outcome_lines[j])
> +        if info is None:
> +            print("WARNING: unknown expname/subtest in outcome line 
> --", outcome_lines[j], file=sys.stderr)
> +            continue
> +        outcome, expname, subtest = info
>          if expname not in testcase_line_start:
>              testcase_line_start[expname] = j
>  
> -- 
> 2.31.1
> 
> 


-- 
All the best,
    Serhei
    http://serhei.io