[PATCH] Improve analysis of racy testcases

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Improve analysis of racy testcases
@ 2016-02-17  1:26 Sergio Durigan Junior
  2016-02-25 17:01 ` Pedro Alves
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-02-17  1:26 UTC (permalink / raw)
  To: GDB Patches

This patch is a proposal to introduce some mechanisms to identify racy
testcases present in our testsuite.  As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot to
pollute the gdb-testers mailing list with hundreds of false-positives
messages every month.  Hopefully, by identifying these racy tests in
advance (and automatically) will contribute to the reduction of noise
traffic to gdb-testers, maybe to the point where we will be able to send
the failure messages directly to the authors of the commits.

I spent some time trying to decide the best way to tackle this problem,
and decided that there is no silver bullet.  Racy tests are tricky and
it is difficult to catch them, so the best solution I could find (for
now?) is to run our testsuite a number of times in a row, and then
compare the results (i.e., the gdb.sum files generated during each run).
The more times you run the tests, the more racy tests you are likely to
detect (at the expense of waiting longer and longer).  You can also run
the tests in parallel, which makes things faster (and contribute to
catching more racy tests, because your machine will have less resources
for each test and some of them are likely to fail when this happens).  I
did some tests in my machine (8-core i7, 16GB RAM), and running the
whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.

In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable.  You will assign a number to this
variable, which represents the number of times you want to run the
tests.  So, for example, if you want to run the whole testsuite 3 times
in parallel (using 2 cores), you will do:

  make check RACY_ITER=3 -j2

It is also possible to use the TESTS variable and specify which tests
you want to run:

  make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

And so on.  The output files will be put at the directory
gdb/testsuite/racy_outputs/.

After make invokes the necessary rules to run the tests, it finally runs
a Python script that will analyze the resulting gdb.sum files.  This
Python script will read each file, and construct a series of sets based
on the results of the tests (one set for FAIL's, one for PASS'es, one
for KFAIL's, etc.).  It will then do some set operations and come up
with a list of unique, sorted testcases that are racy.  The algorithm
behind this is:

  for state in PASS, FAIL, XFAIL, XPASS...; do
    if a test's state in every sumfile is $state; then
      it is not racy
    else
      it is racy

IOW, a test must have the same state in every sumfile.

After processing everything, the script prints the racy tests it could
identify on stdout.  I am redirecting this to a file named racy.sum.

Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite.  I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:

  gdb.base/xyz.exp: non-unique message
  gdb.base/xyz.exp: non-unique message <<2>>
  
This means that you will have to be careful about them when you use the
racy.sum file.

I ran the script several times here, and it did a good job catching some
well-known racy tests.  Overall, I am satisfied with this approach and I
think it will be helpful to have it upstream'ed.  I also intend to
extend our BuildBot and create new, specialized builders that will be
responsible for detecting the racy tests every X number of days, but
that will only be done when this patch is accepted.

Comments are welcome, as usual.  Thanks.

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

gdb/testsuite/ChangeLog:
2016-02-17  Sergio Durigan Junior  <sergiodj@redhat.com>

	* Makefile.in (DEFAULT_RACY_ITER): New variable.
	(CHECK_TARGET_TMP): Likewise.
	(check-single-racy): New rule.
	(check-parallel-racy): Likewise.
	(TEST_TARGETS): Adjust rule to account for RACY_ITER.
	(do-check-parallel-racy): New rule.
	(check-racy/%.exp): Likewise.
	* README (Racy testcases): New section.
	* analyze-racy-logs.py: New file.

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index 38c3052..4bf6191 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -52,6 +52,8 @@ RUNTESTFLAGS =
 
 FORCE_PARALLEL =
 
+DEFAULT_RACY_ITER = 3
+
 RUNTEST_FOR_TARGET = `\
   if [ -f $${srcdir}/../../dejagnu/runtest ]; then \
     echo $${srcdir}/../../dejagnu/runtest; \
@@ -140,7 +142,8 @@ installcheck:
 # given.  If RUNTESTFLAGS is not empty, then by default the tests will
 # be serialized.  This can be overridden by setting FORCE_PARALLEL to
 # any non-empty value.  For a non-GNU make, do not parallelize.
-@GMAKE_TRUE@CHECK_TARGET = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET_TMP = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET = $(if $(RACY_ITER),$(addsuffix -racy,$(CHECK_TARGET_TMP)),$(CHECK_TARGET_TMP))
 @GMAKE_FALSE@CHECK_TARGET = check-single
 
 # Note that we must resort to a recursive make invocation here,
@@ -190,6 +193,26 @@ DO_RUNTEST = \
 check-single:
 	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
 
+check-single-racy:
+	-rm -rf cache racy_outputs temp
+	mkdir -p racy_outputs; \
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(DO_RUNTEST) --outdir=racy_outputs/ $(RUNTESTFLAGS) \
+	    $(expanded_tests_or_none); \
+	  mv -f racy_outputs/gdb.sum racy_outputs/gdb$$n.sum; \
+	  mv -f racy_outputs/gdb.log racy_outputs/gdb$$n.log; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum
+
 check-parallel:
 	-rm -rf cache outputs temp
 	$(MAKE) -k do-check-parallel; \
@@ -201,6 +224,30 @@ check-parallel:
 	sed -n '/=== gdb Summary ===/,$$ p' gdb.sum; \
 	exit $$result
 
+check-parallel-racy:
+	-rm -rf cache racy_outputs temp
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(MAKE) -k do-check-parallel-racy \
+	    RACY_OUTPUT_N=$$n; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh \
+	    `find racy_outputs/racy$$n -name gdb.sum -print` > \
+	    racy_outputs/gdb$$n.sum; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh -L \
+	    `find racy_outputs/racy$$n -name gdb.log -print` > \
+	    gdb$$n.log; \
+	  sed -n '/=== gdb Summary ===/,$$ p' racy_outputs/gdb$$n.sum; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum
+
 # Turn a list of .exp files into "check/" targets.  Only examine .exp
 # files appearing in a gdb.* directory -- we don't want to pick up
 # lib/ by mistake.  For example, gdb.linespec/linespec.exp becomes
@@ -214,9 +261,9 @@ slow_tests = gdb.base/break-interp.exp gdb.base/interp.exp \
 	gdb.base/multi-forks.exp
 @GMAKE_TRUE@all_tests := $(shell cd $(srcdir) && find gdb.* -name '*.exp' -print)
 @GMAKE_TRUE@reordered_tests := $(slow_tests) $(filter-out $(slow_tests),$(all_tests))
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(reordered_tests))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(reordered_tests))
 @GMAKE_TRUE@else
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(expanded_tests_or_none))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(expanded_tests_or_none))
 @GMAKE_TRUE@endif
 
 do-check-parallel: $(TEST_TARGETS)
@@ -226,6 +273,15 @@ do-check-parallel: $(TEST_TARGETS)
 @GMAKE_TRUE@	-mkdir -p outputs/$*
 @GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
 
+do-check-parallel-racy: $(TEST_TARGETS)
+	@:
+
+@GMAKE_TRUE@check-racy/%.exp:
+@GMAKE_TRUE@	-mkdir -p racy_outputs/racy$(RACY_OUTPUT_N)/$*
+@GMAKE_TRUE@	$(DO_RUNTEST) GDB_PARALLEL=yes \
+@GMAKE_TRUE@	  --outdir=racy_outputs/racy$(RACY_OUTPUT_N)/$* $*.exp \
+@GMAKE_TRUE@	  $(RUNTESTFLAGS)
+
 check/no-matching-tests-found:
 	@echo ""
 	@echo "No matching tests found."
diff --git a/gdb/testsuite/README b/gdb/testsuite/README
index 6b59027..efcd459 100644
--- a/gdb/testsuite/README
+++ b/gdb/testsuite/README
@@ -50,6 +50,29 @@ to any non-empty value:
 If you want to use runtest directly instead of using the Makefile, see
 the description of GDB_PARALLEL below.
 
+Racy testcases
+**************
+
+Unfortunately, GDB has a number of testcases that can randomly pass or
+fail.  We call them "racy testcases", and they can be bothersome when
+one is comparing different testsuite runs.  In order to help
+identifying them, it is possible to run the tests several times in a
+row and ask the testsuite machinery to analyze the results.  To do
+that, you need to specify the RACY_ITER environment variable to make:
+
+	make check RACY_ITER=5 -j4
+
+The value assigned to RACY_ITER represents the number of times you
+wish to run the tests in sequence (in the example above, the entire
+testsuite will be executed 5 times in a row, in parallel).  It is also
+possible to check just a specific test:
+
+	make check TESTS='gdb.base/default.exp' RACY_ITER=3
+
+After running the tests, you shall see a file name 'racy.sum' in the
+gdb/testsuite directory.  You can also inspect the generated *.log and
+*.sum files by looking into the gdb/testsuite/racy_ouputs directory.
+
 Running the Performance Tests
 *****************************
 
diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
new file mode 100755
index 0000000..4f05185
--- /dev/null
+++ b/gdb/testsuite/analyze-racy-logs.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GDB.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+# This program is used to analyze the test results (i.e., *.sum files)
+# generated by GDB's testsuite, and print the testcases that are found
+# to be racy.
+#
+# Racy testcases are considered as being testcases which can
+# intermitently FAIL (or PASS) when run two or more times
+# consecutively, i.e., tests whose results are not deterministic.
+#
+# This program is invoked when the user runs "make check" and
+# specifies the RACY_ITER environment variable.
+
+import sys
+import os
+import re
+
+# The (global) dictionary that stores the associations between a *.sum
+# file and its results.  The data inside it will be stored as:
+#
+# files_and_tests = { 'file1.sum' : { 'PASS' : { 'test1', 'test2' ... },
+#                                     'FAIL' : { 'test5', 'test6' ... },
+#                                     ...
+#                                   },
+#                   { 'file2.sum' : { 'PASS' : { 'test1', 'test3' ... },
+#                                   ...
+#                                   }
+#                   }
+
+files_and_tests = dict ()
+
+# We are interested in lines that start with '.?(PASS|FAIL)'.  In
+# other words, we don't process errors (maybe we should).
+
+sum_matcher = re.compile('^(.?(PASS|FAIL)): (.*)$')
+
+def parse_sum_line (line, dic):
+    """Parse a single LINE from a sumfile, and store the results in the
+dictionary referenced by DIC."""
+    global sum_matcher
+
+    line = line.rstrip ()
+    m = re.match (sum_matcher, line)
+    if m:
+        result = m.group (1)
+        test_name = m.group (3)
+        if result not in dic.keys ():
+            dic[result] = set ()
+        if test_name in dic[result]:
+            # If the line is already present in the dictionary, then
+            # we include a unique identifier in the end of it, in the
+            # form or '<<N>>' (where N is a number >= 2).  This is
+            # useful because the GDB testsuite is full of non-unique
+            # test messages; however, if you process the racy summary
+            # file you will also need to perform this same operation
+            # in order to identify the racy test.
+            i = 2
+            while True:
+                nname = test_name + ' <<' + str (i) + '>>'
+                if nname not in dic[result]:
+                    break
+                i += 1
+            test_name = nname
+        dic[result].add (test_name)
+
+def read_sum_files (files):
+    """Read the sumfiles (passed as a list in the FILES variable), and
+process each one, filling the FILES_AND_TESTS global dictionary with
+information about them. """
+    global files_and_tests
+
+    for x in files:
+        with open (x, 'r') as f:
+            files_and_tests[x] = dict ()
+            for line in f.readlines ():
+                parse_sum_line (line, files_and_tests[x])
+
+def identify_racy_tests ():
+    """Identify and print the racy tests.  This function basically works
+on sets, and the idea behind it is simple.  It takes all the sets that
+refer to the same result (for example, all the sets that contain PASS
+tests), and compare them.  If a test is present in all PASS sets, then
+it is not racy.  Otherwise, it is.
+
+This function does that for all sets (PASS, FAIL, KPASS, KFAIL, etc.),
+and then print a sorted list (without duplicates) of all the tests
+that were found to be racy."""
+    global files_and_tests
+
+    total_set = dict ()
+    for item in files_and_tests:
+        for key in files_and_tests[item]:
+            if key not in total_set:
+                total_set[key] = files_and_tests[item][key]
+            else:
+                total_set[key] &= files_and_tests[item][key]
+
+    racy_tests = set ()
+    for item in files_and_tests:
+        for key in files_and_tests[item]:
+            racy_tests |= files_and_tests[item][key] - total_set[key]
+
+    for line in sorted (racy_tests):
+        print line
+
+if __name__ == '__main__':
+    if len (sys.argv) < 3:
+        # It only makes sense to invoke this program if you pass two
+        # or more files to be analyzed.
+        sys.exit ("Usage: %s [FILE] [FILE] ..." % sys.argv[0])
+    read_sum_files (sys.argv[1:])
+    identify_racy_tests ()
+    exit (0)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-02-17  1:26 [PATCH] Improve analysis of racy testcases Sergio Durigan Junior
@ 2016-02-25 17:01 ` Pedro Alves
  2016-02-28 21:44   ` Sergio Durigan Junior
  2016-02-25 18:34 ` Antoine Tremblay
  2016-03-01  5:51 ` [PATCH v2] " Sergio Durigan Junior
  2 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2016-02-25 17:01 UTC (permalink / raw)
  To: Sergio Durigan Junior, GDB Patches

On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> This patch is a proposal to introduce some mechanisms to identify racy
> testcases present in our testsuite.  As can be seen in previous
> discussions, racy tests are really bothersome and cause our BuildBot to
> pollute the gdb-testers mailing list with hundreds of false-positives
> messages every month.  Hopefully, by identifying these racy tests in
> advance (and automatically) will contribute to the reduction of noise
> traffic to gdb-testers, maybe to the point where we will be able to send
> the failure messages directly to the authors of the commits.
> 
> I spent some time trying to decide the best way to tackle this problem,
> and decided that there is no silver bullet.  Racy tests are tricky and
> it is difficult to catch them, so the best solution I could find (for
> now?) is to run our testsuite a number of times in a row, and then
> compare the results (i.e., the gdb.sum files generated during each run).
> The more times you run the tests, the more racy tests you are likely to
> detect (at the expense of waiting longer and longer).  You can also run
> the tests in parallel, which makes things faster (and contribute to
> catching more racy tests, because your machine will have less resources
> for each test and some of them are likely to fail when this happens).  I
> did some tests in my machine (8-core i7, 16GB RAM), and running the
> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
> 
> In order to run the racy test machinery, you need to specify the
> RACY_ITER environment variable.  You will assign a number to this
> variable, which represents the number of times you want to run the
> tests.  So, for example, if you want to run the whole testsuite 3 times
> in parallel (using 2 cores), you will do:
> 
>   make check RACY_ITER=3 -j2
> 
> It is also possible to use the TESTS variable and specify which tests
> you want to run:
> 
>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

I did this, and noticed:

$ ls testsuite/racy_outputs/
gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum

I think it'd make more sense, and it'd be easier to copy
test run results around, diff result dirs etc., if these
were put in subdirs instead:

 testsuite/racy_outputs/1/gdb.sum
 testsuite/racy_outputs/1/gdb.sum
 testsuite/race_outputs/1/gdb.log
 testsuite/race_outputs/2/gdb.sum
 testsuite/race_outputs/2/gdb.log

etc.  Any reason not to?

> 
> And so on.  The output files will be put at the directory
> gdb/testsuite/racy_outputs/.
> 
> After make invokes the necessary rules to run the tests, it finally runs
> a Python script that will analyze the resulting gdb.sum files.  This
> Python script will read each file, and construct a series of sets based
> on the results of the tests (one set for FAIL's, one for PASS'es, one
> for KFAIL's, etc.).  It will then do some set operations and come up
> with a list of unique, sorted testcases that are racy.  The algorithm
> behind this is:
> 
>   for state in PASS, FAIL, XFAIL, XPASS...; do
>     if a test's state in every sumfile is $state; then
>       it is not racy
>     else
>       it is racy
> 
> IOW, a test must have the same state in every sumfile.

Is it easy to start with preexisting dirs with gdb.sum/gdb.log
files, and run this script on them?

E.g., say I've already run the testsuite three times, and now
I want to check for racy tests, using the test results I already
have:

 testruns/2015-02-21/1/gdb.sum
 testruns/2015-02-21/2/gdb.sum
 testruns/2015-02-21/3/gdb.sum

I think we should document that.

> 
> After processing everything, the script prints the racy tests it could
> identify on stdout.  I am redirecting this to a file named racy.sum.

I tried:
 make check TESTS='gdb.base/default.exp' RACY_ITER=3

and was surprised to see nothing output at the end.  Then I found that
the racy.sum file is completely empty.  How about we instead make sure
that something is already printed?  Maybe follow dejagnu's output style,
and print a summary at the end?

I've now run "make check -j8 RACY_ITER=3" and got this:

$ cat testsuite/racy.sum
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork

The gdb.threads/process-dies-while-handling-bp.exp ones are actually:

 -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
 +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)

Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.


> 
> Something else that I wasn't sure how to deal with was non-unique
> messages in our testsuite.  I decided to do the same thing I do in our
> BuildBot: include a unique identifier in the end of message, like:
> 
>   gdb.base/xyz.exp: non-unique message
>   gdb.base/xyz.exp: non-unique message <<2>>
>   
> This means that you will have to be careful about them when you use the
> racy.sum file.
> 
> I ran the script several times here, and it did a good job catching some
> well-known racy tests.  Overall, I am satisfied with this approach and I
> think it will be helpful to have it upstream'ed. 

I think so too.  We don't have to fix everything before its put in either.
And having this doesn't preclude exploring other approaches either.

The only thing I do wish we should do, is use the fruits of this
to somehow mark racy tests in the testsuite itself, instead of only
making the buildbot ignore them, so that local development benefits
as well.

> I also intend to
> extend our BuildBot and create new, specialized builders that will be
> responsible for detecting the racy tests every X number of days, but
> that will only be done when this patch is accepted.

That'll be great.

> Comments are welcome, as usual.  Thanks.

On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
> index 38c3052..4bf6191 100644
> --- a/gdb/testsuite/Makefile.in
> +++ b/gdb/testsuite/Makefile.in
> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>
>  FORCE_PARALLEL =
>
> +DEFAULT_RACY_ITER = 3

Are users supposed to be able to trigger the default?  I didn't
see it documented.


> +Racy testcases
> +**************
> +
> +Unfortunately, GDB has a number of testcases that can randomly pass or
> +fail.  We call them "racy testcases", and they can be bothersome when
> +one is comparing different testsuite runs.  In order to help

I'd suggest future-proofing this, by rephrasing it as if we were at a
stage where the testsuite is stable, and new racy tests occasionally are
added to the testsuite.

>
> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
> new file mode 100755
> index 0000000..4f05185
> --- /dev/null
> +++ b/gdb/testsuite/analyze-racy-logs.py
> @@ -0,0 +1,131 @@
> +#!/usr/bin/env python
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +#
> +# This file is part of GDB.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +
> +# This program is used to analyze the test results (i.e., *.sum files)
> +# generated by GDB's testsuite, and print the testcases that are found
> +# to be racy.
> +#
> +# Racy testcases are considered as being testcases which can
> +# intermitently FAIL (or PASS) when run two or more times

intermittently

I think you should put this in as is, and we can keep working on
it on top.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-02-17  1:26 [PATCH] Improve analysis of racy testcases Sergio Durigan Junior
  2016-02-25 17:01 ` Pedro Alves
@ 2016-02-25 18:34 ` Antoine Tremblay
  2016-03-01  5:58   ` Sergio Durigan Junior
  2016-03-01  5:51 ` [PATCH v2] " Sergio Durigan Junior
  2 siblings, 1 reply; 15+ messages in thread
From: Antoine Tremblay @ 2016-02-25 18:34 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: GDB Patches


Sergio Durigan Junior writes:

> This patch is a proposal to introduce some mechanisms to identify racy
> testcases present in our testsuite.  As can be seen in previous
> discussions, racy tests are really bothersome and cause our BuildBot to
> pollute the gdb-testers mailing list with hundreds of false-positives
> messages every month.  Hopefully, by identifying these racy tests in
> advance (and automatically) will contribute to the reduction of noise
> traffic to gdb-testers, maybe to the point where we will be able to send
> the failure messages directly to the authors of the commits.
>
> I spent some time trying to decide the best way to tackle this problem,
> and decided that there is no silver bullet.  Racy tests are tricky and
> it is difficult to catch them, so the best solution I could find (for
> now?) is to run our testsuite a number of times in a row, and then
> compare the results (i.e., the gdb.sum files generated during each run).
> The more times you run the tests, the more racy tests you are likely to
> detect (at the expense of waiting longer and longer).  You can also run
> the tests in parallel, which makes things faster (and contribute to
> catching more racy tests, because your machine will have less resources
> for each test and some of them are likely to fail when this happens).  I
> did some tests in my machine (8-core i7, 16GB RAM), and running the
> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
>
> In order to run the racy test machinery, you need to specify the
> RACY_ITER environment variable.  You will assign a number to this
> variable, which represents the number of times you want to run the
> tests.  So, for example, if you want to run the whole testsuite 3 times
> in parallel (using 2 cores), you will do:
>
>   make check RACY_ITER=3 -j2
>
> It is also possible to use the TESTS variable and specify which tests
> you want to run:
>
>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
>
> And so on.  The output files will be put at the directory
> gdb/testsuite/racy_outputs/.
>
> After make invokes the necessary rules to run the tests, it finally runs
> a Python script that will analyze the resulting gdb.sum files.  This
> Python script will read each file, and construct a series of sets based
> on the results of the tests (one set for FAIL's, one for PASS'es, one
> for KFAIL's, etc.).  It will then do some set operations and come up
> with a list of unique, sorted testcases that are racy.  The algorithm
> behind this is:
>
>   for state in PASS, FAIL, XFAIL, XPASS...; do
>     if a test's state in every sumfile is $state; then
>       it is not racy
>     else
>       it is racy
>
> IOW, a test must have the same state in every sumfile.
>
> After processing everything, the script prints the racy tests it could
> identify on stdout.  I am redirecting this to a file named racy.sum.
>
> Something else that I wasn't sure how to deal with was non-unique
> messages in our testsuite.  I decided to do the same thing I do in our
> BuildBot: include a unique identifier in the end of message, like:
>
>   gdb.base/xyz.exp: non-unique message
>   gdb.base/xyz.exp: non-unique message <<2>>
>   
> This means that you will have to be careful about them when you use the
> racy.sum file.
>
> I ran the script several times here, and it did a good job catching some
> well-known racy tests.  Overall, I am satisfied with this approach and I
> think it will be helpful to have it upstream'ed.  I also intend to
> extend our BuildBot and create new, specialized builders that will be
> responsible for detecting the racy tests every X number of days, but
> that will only be done when this patch is accepted.
>
> Comments are welcome, as usual.  Thanks.

Thanks for this ! This was quite a problem for me while testing on arm.
I'm testing it now...

One note maybe it would be nice output the list of unracy tests too to
be able to auto-build a list of tests to run out of this since I'm not
sure you can set an exclusion list ?

Regards,
Antoine

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-02-25 17:01 ` Pedro Alves
@ 2016-02-28 21:44   ` Sergio Durigan Junior
  2016-03-01 11:56     ` Pedro Alves
  0 siblings, 1 reply; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-02-28 21:44 UTC (permalink / raw)
  To: Pedro Alves; +Cc: GDB Patches

On Thursday, February 25 2016, Pedro Alves wrote:

> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> This patch is a proposal to introduce some mechanisms to identify racy
>> testcases present in our testsuite.  As can be seen in previous
>> discussions, racy tests are really bothersome and cause our BuildBot to
>> pollute the gdb-testers mailing list with hundreds of false-positives
>> messages every month.  Hopefully, by identifying these racy tests in
>> advance (and automatically) will contribute to the reduction of noise
>> traffic to gdb-testers, maybe to the point where we will be able to send
>> the failure messages directly to the authors of the commits.
>> 
>> I spent some time trying to decide the best way to tackle this problem,
>> and decided that there is no silver bullet.  Racy tests are tricky and
>> it is difficult to catch them, so the best solution I could find (for
>> now?) is to run our testsuite a number of times in a row, and then
>> compare the results (i.e., the gdb.sum files generated during each run).
>> The more times you run the tests, the more racy tests you are likely to
>> detect (at the expense of waiting longer and longer).  You can also run
>> the tests in parallel, which makes things faster (and contribute to
>> catching more racy tests, because your machine will have less resources
>> for each test and some of them are likely to fail when this happens).  I
>> did some tests in my machine (8-core i7, 16GB RAM), and running the
>> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
>> 
>> In order to run the racy test machinery, you need to specify the
>> RACY_ITER environment variable.  You will assign a number to this
>> variable, which represents the number of times you want to run the
>> tests.  So, for example, if you want to run the whole testsuite 3 times
>> in parallel (using 2 cores), you will do:
>> 
>>   make check RACY_ITER=3 -j2
>> 
>> It is also possible to use the TESTS variable and specify which tests
>> you want to run:
>> 
>>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
>
> I did this, and noticed:
>
> $ ls testsuite/racy_outputs/
> gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum
>
> I think it'd make more sense, and it'd be easier to copy
> test run results around, diff result dirs etc., if these
> were put in subdirs instead:
>
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/race_outputs/1/gdb.log
>  testsuite/race_outputs/2/gdb.sum
>  testsuite/race_outputs/2/gdb.log
>
> etc.  Any reason not to?

Not really.  I think it's a good idea, so I modified the Makefile to
implement it.

>> And so on.  The output files will be put at the directory
>> gdb/testsuite/racy_outputs/.
>> 
>> After make invokes the necessary rules to run the tests, it finally runs
>> a Python script that will analyze the resulting gdb.sum files.  This
>> Python script will read each file, and construct a series of sets based
>> on the results of the tests (one set for FAIL's, one for PASS'es, one
>> for KFAIL's, etc.).  It will then do some set operations and come up
>> with a list of unique, sorted testcases that are racy.  The algorithm
>> behind this is:
>> 
>>   for state in PASS, FAIL, XFAIL, XPASS...; do
>>     if a test's state in every sumfile is $state; then
>>       it is not racy
>>     else
>>       it is racy
>> 
>> IOW, a test must have the same state in every sumfile.
>
> Is it easy to start with preexisting dirs with gdb.sum/gdb.log
> files, and run this script on them?

Yes, you just need to provide two or more .sum files to the script.

> E.g., say I've already run the testsuite three times, and now
> I want to check for racy tests, using the test results I already
> have:
>
>  testruns/2015-02-21/1/gdb.sum
>  testruns/2015-02-21/2/gdb.sum
>  testruns/2015-02-21/3/gdb.sum
>
> I think we should document that.

Good idea.  I will write more about how to use the script if you already
have the .sum files.  I assume that gdb/testsuite/README is a good place
to put this information.

>> After processing everything, the script prints the racy tests it could
>> identify on stdout.  I am redirecting this to a file named racy.sum.
>
> I tried:
>  make check TESTS='gdb.base/default.exp' RACY_ITER=3
>
> and was surprised to see nothing output at the end.  Then I found that
> the racy.sum file is completely empty.  How about we instead make sure
> that something is already printed?  Maybe follow dejagnu's output style,
> and print a summary at the end?

Alright, sounds doable.

> I've now run "make check -j8 RACY_ITER=3" and got this:
>
> $ cat testsuite/racy.sum
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>
> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>
>  -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>  +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>
> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.

I thought about stripping tail ()s away before comparing the names, but
the problem is that maybe we'll miss a test that actually writes
something meaningful inside the parentheses.  There isn't a strong
convention advising testcase writers to not do that.  What do you think?

As for the PASS->KFAIL->PASS, that's a good idea indeed.  I'll implement
that.

>> 
>> Something else that I wasn't sure how to deal with was non-unique
>> messages in our testsuite.  I decided to do the same thing I do in our
>> BuildBot: include a unique identifier in the end of message, like:
>> 
>>   gdb.base/xyz.exp: non-unique message
>>   gdb.base/xyz.exp: non-unique message <<2>>
>>   
>> This means that you will have to be careful about them when you use the
>> racy.sum file.
>> 
>> I ran the script several times here, and it did a good job catching some
>> well-known racy tests.  Overall, I am satisfied with this approach and I
>> think it will be helpful to have it upstream'ed. 
>
> I think so too.  We don't have to fix everything before its put in either.
> And having this doesn't preclude exploring other approaches either.
>
> The only thing I do wish we should do, is use the fruits of this
> to somehow mark racy tests in the testsuite itself, instead of only
> making the buildbot ignore them, so that local development benefits
> as well.

I totally agree, and also spent some time thinking about this problem,
but I don't see an easy solution for that.  Racy testcases vary wildly
between targets, GDB vs. gdbserver, CFLAGS, etc.  We would have to
maintain several lists of racy tests, and even this wouldn't be 100%
trustworthy because the racy tests also vary depending on the machine
load...

I'm eager to see what you/others think about this specific problem.

>> I also intend to
>> extend our BuildBot and create new, specialized builders that will be
>> responsible for detecting the racy tests every X number of days, but
>> that will only be done when this patch is accepted.
>
> That'll be great.
>
>> Comments are welcome, as usual.  Thanks.
>
> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
>> index 38c3052..4bf6191 100644
>> --- a/gdb/testsuite/Makefile.in
>> +++ b/gdb/testsuite/Makefile.in
>> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>>
>>  FORCE_PARALLEL =
>>
>> +DEFAULT_RACY_ITER = 3
>
> Are users supposed to be able to trigger the default?  I didn't
> see it documented.

Only if they invoke the rule by hand without specifying RACY_ITER, like:

  make check-racy-parallel -j4

In this case, DEFAULT_RACY_ITER would be used.  And yes, I did not
document it, but I should.  Thanks for mentioning.

>> +Racy testcases
>> +**************
>> +
>> +Unfortunately, GDB has a number of testcases that can randomly pass or
>> +fail.  We call them "racy testcases", and they can be bothersome when
>> +one is comparing different testsuite runs.  In order to help
>
> I'd suggest future-proofing this, by rephrasing it as if we were at a
> stage where the testsuite is stable, and new racy tests occasionally are
> added to the testsuite.

Gotcha.

>> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
>> new file mode 100755
>> index 0000000..4f05185
>> --- /dev/null
>> +++ b/gdb/testsuite/analyze-racy-logs.py
>> @@ -0,0 +1,131 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GDB.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +
>> +# This program is used to analyze the test results (i.e., *.sum files)
>> +# generated by GDB's testsuite, and print the testcases that are found
>> +# to be racy.
>> +#
>> +# Racy testcases are considered as being testcases which can
>> +# intermitently FAIL (or PASS) when run two or more times
>
> intermittently

Ops, thanks.

> I think you should put this in as is, and we can keep working on
> it on top.

I'll send a v2 in a few minutes with your suggestions addressed, and
then I'll push it if you're OK with it.

Thanks a lot!

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] Improve analysis of racy testcases
  2016-02-17  1:26 [PATCH] Improve analysis of racy testcases Sergio Durigan Junior
  2016-02-25 17:01 ` Pedro Alves
  2016-02-25 18:34 ` Antoine Tremblay
@ 2016-03-01  5:51 ` Sergio Durigan Junior
  2016-03-01 12:41   ` Pedro Alves
  2 siblings, 1 reply; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-03-01  5:51 UTC (permalink / raw)
  To: GDB Patches; +Cc: Pedro Alves

This is the second version of the patch.  It addresses the following
comments from Pedro:

- Document the DEFAULT_RACY_ITER variable (on gdb/testsuite/Makefile.in
  and gdb/testsuite/README)

- Rephrase the "Racy testcases" section on gdb/testsuite/README

- Improve the output of the racy analyzer in order to make it look more
  like DejaGNU's output.

- Implement a mechanism for ignoring specific testcases based on their
  results (e.g., PASS -> KFAIL).


This patch is a proposal to introduce some mechanisms to identify racy
testcases present in our testsuite.  As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot to
pollute the gdb-testers mailing list with hundreds of false-positives
messages every month.  Hopefully, by identifying these racy tests in
advance (and automatically) will contribute to the reduction of noise
traffic to gdb-testers, maybe to the point where we will be able to send
the failure messages directly to the authors of the commits.

I spent some time trying to decide the best way to tackle this problem,
and decided that there is no silver bullet.  Racy tests are tricky and
it is difficult to catch them, so the best solution I could find (for
now?) is to run our testsuite a number of times in a row, and then
compare the results (i.e., the gdb.sum files generated during each run).
The more times you run the tests, the more racy tests you are likely to
detect (at the expense of waiting longer and longer).  You can also run
the tests in parallel, which makes things faster (and contribute to
catching more racy tests, because your machine will have less resources
for each test and some of them are likely to fail when this happens).  I
did some tests in my machine (8-core i7, 16GB RAM), and running the
whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.

In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable.  You will assign a number to this
variable, which represents the number of times you want to run the
tests.  So, for example, if you want to run the whole testsuite 3 times
in parallel (using 2 cores), you will do:

  make check RACY_ITER=3 -j2

It is also possible to use the TESTS variable and specify which tests
you want to run:

  make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

And so on.  The output files will be put at the directory
gdb/testsuite/racy_outputs/.

After make invokes the necessary rules to run the tests, it finally runs
a Python script that will analyze the resulting gdb.sum files.  This
Python script will read each file, and construct a series of sets based
on the results of the tests (one set for FAIL's, one for PASS'es, one
for KFAIL's, etc.).  It will then do some set operations and come up
with a list of unique, sorted testcases that are racy.  The algorithm
behind this is:

  for state in PASS, FAIL, XFAIL, XPASS...; do
    if a test's state in every sumfile is $state; then
      it is not racy
    else
      it is racy

IOW, a test must have the same state in every sumfile.

After processing everything, the script prints the racy tests it could
identify on stdout.  I am redirecting this to a file named racy.sum.

Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite.  I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:

  gdb.base/xyz.exp: non-unique message
  gdb.base/xyz.exp: non-unique message <<2>>
  
This means that you will have to be careful about them when you use the
racy.sum file.

I ran the script several times here, and it did a good job catching some
well-known racy tests.  Overall, I am satisfied with this approach and I
think it will be helpful to have it upstream'ed.  I also intend to
extend our BuildBot and create new, specialized builders that will be
responsible for detecting the racy tests every X number of days, but
that will only be done when this patch is accepted.

Comments are welcome, as usual.  Thanks.

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

gdb/testsuite/ChangeLog:
2016-03-01  Sergio Durigan Junior  <sergiodj@redhat.com>

	* Makefile.in (DEFAULT_RACY_ITER): New variable.
	(CHECK_TARGET_TMP): Likewise.
	(check-single-racy): New rule.
	(check-parallel-racy): Likewise.
	(TEST_TARGETS): Adjust rule to account for RACY_ITER.
	(do-check-parallel-racy): New rule.
	(check-racy/%.exp): Likewise.
	* README (Racy testcases): New section.
	* analyze-racy-logs.py: New file.

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index 38c3052..b5e53ad 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -52,6 +52,12 @@ RUNTESTFLAGS =
 
 FORCE_PARALLEL =
 
+# Default number of iterations that we will use to run the testsuite
+# if the user does not specify the RACY_ITER environment variable
+# (e.g., when the user calls the make rule directly from the command
+# line).
+DEFAULT_RACY_ITER = 3
+
 RUNTEST_FOR_TARGET = `\
   if [ -f $${srcdir}/../../dejagnu/runtest ]; then \
     echo $${srcdir}/../../dejagnu/runtest; \
@@ -140,7 +146,8 @@ installcheck:
 # given.  If RUNTESTFLAGS is not empty, then by default the tests will
 # be serialized.  This can be overridden by setting FORCE_PARALLEL to
 # any non-empty value.  For a non-GNU make, do not parallelize.
-@GMAKE_TRUE@CHECK_TARGET = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET_TMP = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET = $(if $(RACY_ITER),$(addsuffix -racy,$(CHECK_TARGET_TMP)),$(CHECK_TARGET_TMP))
 @GMAKE_FALSE@CHECK_TARGET = check-single
 
 # Note that we must resort to a recursive make invocation here,
@@ -190,6 +197,26 @@ DO_RUNTEST = \
 check-single:
 	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
 
+check-single-racy:
+	-rm -rf cache racy_outputs temp
+	mkdir -p racy_outputs; \
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  mkdir -p racy_outputs/$$n; \
+	  $(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \
+	    $(expanded_tests_or_none); \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum; \
+	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
+
 check-parallel:
 	-rm -rf cache outputs temp
 	$(MAKE) -k do-check-parallel; \
@@ -201,6 +228,31 @@ check-parallel:
 	sed -n '/=== gdb Summary ===/,$$ p' gdb.sum; \
 	exit $$result
 
+check-parallel-racy:
+	-rm -rf cache racy_outputs temp
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(MAKE) -k do-check-parallel-racy \
+	    RACY_OUTPUT_N=$$n; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh \
+	    `find racy_outputs/$$n -name gdb.sum -print` > \
+	    racy_outputs/$$n/gdb.sum; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh -L \
+	    `find racy_outputs/$$n -name gdb.log -print` > \
+	    racy_outputs/$$n/gdb.log; \
+	  sed -n '/=== gdb Summary ===/,$$ p' racy_outputs/$$n/gdb.sum; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/*/gdb.sum` > racy.sum; \
+	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
+
 # Turn a list of .exp files into "check/" targets.  Only examine .exp
 # files appearing in a gdb.* directory -- we don't want to pick up
 # lib/ by mistake.  For example, gdb.linespec/linespec.exp becomes
@@ -214,9 +266,9 @@ slow_tests = gdb.base/break-interp.exp gdb.base/interp.exp \
 	gdb.base/multi-forks.exp
 @GMAKE_TRUE@all_tests := $(shell cd $(srcdir) && find gdb.* -name '*.exp' -print)
 @GMAKE_TRUE@reordered_tests := $(slow_tests) $(filter-out $(slow_tests),$(all_tests))
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(reordered_tests))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(reordered_tests))
 @GMAKE_TRUE@else
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(expanded_tests_or_none))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(expanded_tests_or_none))
 @GMAKE_TRUE@endif
 
 do-check-parallel: $(TEST_TARGETS)
@@ -226,6 +278,15 @@ do-check-parallel: $(TEST_TARGETS)
 @GMAKE_TRUE@	-mkdir -p outputs/$*
 @GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
 
+do-check-parallel-racy: $(TEST_TARGETS)
+	@:
+
+@GMAKE_TRUE@check-racy/%.exp:
+@GMAKE_TRUE@	-mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*
+@GMAKE_TRUE@	$(DO_RUNTEST) GDB_PARALLEL=yes \
+@GMAKE_TRUE@	  --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \
+@GMAKE_TRUE@	  $(RUNTESTFLAGS)
+
 check/no-matching-tests-found:
 	@echo ""
 	@echo "No matching tests found."
diff --git a/gdb/testsuite/README b/gdb/testsuite/README
index 6b59027..043a8bd 100644
--- a/gdb/testsuite/README
+++ b/gdb/testsuite/README
@@ -50,6 +50,51 @@ to any non-empty value:
 If you want to use runtest directly instead of using the Makefile, see
 the description of GDB_PARALLEL below.
 
+Racy testcases
+**************
+
+Sometimes, new testcases are added to the testsuite that are not
+entirely deterministic, and can randomly pass or fail.  We call them
+"racy testcases", and they can be bothersome when one is comparing
+different testsuite runs.  In order to help identifying them, it is
+possible to run the tests several times in a row and ask the testsuite
+machinery to analyze the results.  To do that, you need to specify the
+RACY_ITER environment variable to make:
+
+	make check RACY_ITER=5 -j4
+
+The value assigned to RACY_ITER represents the number of times you
+wish to run the tests in sequence (in the example above, the entire
+testsuite will be executed 5 times in a row, in parallel).  It is also
+possible to check just a specific test:
+
+	make check TESTS='gdb.base/default.exp' RACY_ITER=3
+
+One can also decide to call the Makefile rules by hand inside the
+gdb/testsuite directory, e.g.:
+
+	make check-paralell-racy -j4
+
+In which case the value of the DEFAULT_RACY_ITER variable (inside
+gdb/testsuite/Makefile.in) will be used to determine how many
+iterations will be run.
+
+After running the tests, you shall see a file name 'racy.sum' in the
+gdb/testsuite directory.  You can also inspect the generated *.log and
+*.sum files by looking into the gdb/testsuite/racy_ouputs directory.
+
+If you already have *.sum files generated from previous testsuite runs
+and you would like to analyze them without having to run the testsuite
+again, you can also use the 'analyze-racy-logs.py' script directly.
+It is located in the gdb/testsuite/ directory, and it expects a list
+of two or more *.sum files to be provided as its argument.  For
+example:
+
+	./gdb/testsuite/analyze-racy-logs.py testsuite-01/gdb.sum \
+	  testsuite-02/gdb.sum testsuite-03/gdb.sum
+
+The script will output its analysis report to the standard output.
+
 Running the Performance Tests
 *****************************
 
diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
new file mode 100755
index 0000000..1083b51
--- /dev/null
+++ b/gdb/testsuite/analyze-racy-logs.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GDB.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+# This program is used to analyze the test results (i.e., *.sum files)
+# generated by GDB's testsuite, and print the testcases that are found
+# to be racy.
+#
+# Racy testcases are considered as being testcases which can
+# intermittently FAIL (or PASS) when run two or more times
+# consecutively, i.e., tests whose results are not deterministic.
+#
+# This program is invoked when the user runs "make check" and
+# specifies the RACY_ITER environment variable.
+
+import sys
+import os
+import re
+
+# The (global) dictionary that stores the associations between a *.sum
+# file and its results.  The data inside it will be stored as:
+#
+# files_and_tests = { 'file1.sum' : { 'PASS' : { 'test1', 'test2' ... },
+#                                     'FAIL' : { 'test5', 'test6' ... },
+#                                     ...
+#                                   },
+#                   { 'file2.sum' : { 'PASS' : { 'test1', 'test3' ... },
+#                                   ...
+#                                   }
+#                   }
+
+files_and_tests = dict ()
+
+# The relatioships between various states of the same tests that
+# should be ignored.  For example, if the same test PASSes on a
+# testcase run but KFAILs on another, this test should be considered
+# racy because a known-failure is...  known.
+
+ignore_relations = { 'PASS' : 'KFAIL' }
+
+# We are interested in lines that start with '.?(PASS|FAIL)'.  In
+# other words, we don't process errors (maybe we should).
+
+sum_matcher = re.compile('^(.?(PASS|FAIL)): (.*)$')
+
+def parse_sum_line (line, dic):
+    """Parse a single LINE from a sumfile, and store the results in the
+dictionary referenced by DIC."""
+    global sum_matcher
+
+    line = line.rstrip ()
+    m = re.match (sum_matcher, line)
+    if m:
+        result = m.group (1)
+        test_name = m.group (3)
+        # Remove tail parentheses.  These are likely to be '(timeout)'
+        # and other extra information that will only confuse us.
+        test_name = re.sub ('(\s+)?\(.*$', '', test_name)
+        if result not in dic.keys ():
+            dic[result] = set ()
+        if test_name in dic[result]:
+            # If the line is already present in the dictionary, then
+            # we include a unique identifier in the end of it, in the
+            # form or '<<N>>' (where N is a number >= 2).  This is
+            # useful because the GDB testsuite is full of non-unique
+            # test messages; however, if you process the racy summary
+            # file you will also need to perform this same operation
+            # in order to identify the racy test.
+            i = 2
+            while True:
+                nname = test_name + ' <<' + str (i) + '>>'
+                if nname not in dic[result]:
+                    break
+                i += 1
+            test_name = nname
+        dic[result].add (test_name)
+
+def read_sum_files (files):
+    """Read the sumfiles (passed as a list in the FILES variable), and
+process each one, filling the FILES_AND_TESTS global dictionary with
+information about them. """
+    global files_and_tests
+
+    for x in files:
+        with open (x, 'r') as f:
+            files_and_tests[x] = dict ()
+            for line in f.readlines ():
+                parse_sum_line (line, files_and_tests[x])
+
+def identify_racy_tests ():
+    """Identify and print the racy tests.  This function basically works
+on sets, and the idea behind it is simple.  It takes all the sets that
+refer to the same result (for example, all the sets that contain PASS
+tests), and compare them.  If a test is present in all PASS sets, then
+it is not racy.  Otherwise, it is.
+
+This function does that for all sets (PASS, FAIL, KPASS, KFAIL, etc.),
+and then print a sorted list (without duplicates) of all the tests
+that were found to be racy."""
+    global files_and_tests
+
+    # First, construct two dictionaries that will hold one set of
+    # testcases for each state (PASS, FAIL, etc.).
+    #
+    # Each set in NONRACY_TESTS will contain only the non-racy
+    # testcases for that state.  A non-racy testcase is a testcase
+    # that has the same state in all test runs.
+    #
+    # Each set in ALL_TESTS will contain all tests, racy or not, for
+    # that state.
+    nonracy_tests = dict ()
+    all_tests = dict ()
+    for f in files_and_tests:
+        for state in files_and_tests[f]:
+            try:
+                nonracy_tests[state] &= files_and_tests[f][state].copy ()
+            except KeyError:
+                nonracy_tests[state] = files_and_tests[f][state].copy ()
+
+            try:
+                all_tests[state] |= files_and_tests[f][state].copy ()
+            except KeyError:
+                all_tests[state] = files_and_tests[f][state].copy ()
+
+    # Now, we eliminate the tests that are present in states that need
+    # to be ignored.  For example, tests both in the PASS and KFAIL
+    # states should not be considered racy.
+    ignored_tests = set ()
+    for s1, s2 in ignore_relations.iteritems ():
+        try:
+            ignored_tests |= (all_tests[s1] & all_tests[s2])
+        except:
+            continue
+
+    racy_tests = set ()
+    for f in files_and_tests:
+        for state in files_and_tests[f]:
+            racy_tests |= files_and_tests[f][state] - nonracy_tests[state]
+
+    racy_tests = racy_tests - ignored_tests
+
+    # Print the header.
+    print "\t\t=== gdb tests ===\n"
+
+    # Print each test.
+    for line in sorted (racy_tests):
+        print line
+
+    # Print the summary.
+    print "\n"
+    print "\t\t=== gdb Summary ===\n"
+    print "# of racy testcases:\t\t%d" % len (racy_tests)
+
+if __name__ == '__main__':
+    if len (sys.argv) < 3:
+        # It only makes sense to invoke this program if you pass two
+        # or more files to be analyzed.
+        sys.exit ("Usage: %s [FILE] [FILE] ..." % sys.argv[0])
+    read_sum_files (sys.argv[1:])
+    identify_racy_tests ()
+    exit (0)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-02-25 18:34 ` Antoine Tremblay
@ 2016-03-01  5:58   ` Sergio Durigan Junior
  2016-03-14 12:32     ` Antoine Tremblay
  0 siblings, 1 reply; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-03-01  5:58 UTC (permalink / raw)
  To: Antoine Tremblay; +Cc: GDB Patches

On Thursday, February 25 2016, Antoine Tremblay wrote:

> Sergio Durigan Junior writes:
>
>> This patch is a proposal to introduce some mechanisms to identify racy
>> testcases present in our testsuite.  As can be seen in previous
>> discussions, racy tests are really bothersome and cause our BuildBot to
>> pollute the gdb-testers mailing list with hundreds of false-positives
>> messages every month.  Hopefully, by identifying these racy tests in
>> advance (and automatically) will contribute to the reduction of noise
>> traffic to gdb-testers, maybe to the point where we will be able to send
>> the failure messages directly to the authors of the commits.
>> [...]
> Thanks for this ! This was quite a problem for me while testing on arm.
> I'm testing it now...

Thanks!  Please let me know if you find anything wrong with the script.

> One note maybe it would be nice output the list of unracy tests too to
> be able to auto-build a list of tests to run out of this since I'm not
> sure you can set an exclusion list ?

Hm, it would be possible to output the non-racy tests, but only to a
different file (instead of outputting to stdout, as I'm doing with the
racy tests).  Perhaps this could be a separate option to the script?
I'm not sure if the users would always want this information...

As for the exclusion list you mentioned, DejaGNU's runtest allows one to
specify a --ignore flag with the names of the tests you don't want to
run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
tried.  I'll give it a go tomorrow.

Cheers,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-02-28 21:44   ` Sergio Durigan Junior
@ 2016-03-01 11:56     ` Pedro Alves
  2016-03-06  1:29       ` Sergio Durigan Junior
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2016-03-01 11:56 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: GDB Patches

On 02/28/2016 09:44 PM, Sergio Durigan Junior wrote:
> On Thursday, February 25 2016, Pedro Alves wrote:

>
>> I've now run "make check -j8 RACY_ITER=3" and got this:
>>
>> $ cat testsuite/racy.sum
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
>> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>>

>> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>>
>>   -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>   +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>
>> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.
>
> I thought about stripping tail ()s away before comparing the names, but
> the problem is that maybe we'll miss a test that actually writes
> something meaningful inside the parentheses.  There isn't a strong
> convention advising testcase writers to not do that.  What do you think?

I think we should have that convention.  We already largely implicitly
have it, exactly because of "(PRMS: gdb/NNN)", "(timeout)" or "(eof)",
etc.

IOW, I think this should be interpreted as a regression in the
"whatever test" test:

  -PASS: gdb.base/foo.exp: whatever test
  +FAIL: gdb.base/foo.exp: whatever test (timeout)

If that actually strips something meaningful, I'd just file it under
the same bucket as non-unique test names, and fix it by tweaking the
test message.

>> The only thing I do wish we should do, is use the fruits of this
>> to somehow mark racy tests in the testsuite itself, instead of only
>> making the buildbot ignore them, so that local development benefits
>> as well.
>
> I totally agree, and also spent some time thinking about this problem,
> but I don't see an easy solution for that.  Racy testcases vary wildly
> between targets, GDB vs. gdbserver, CFLAGS, etc.
> We would have to maintain several lists of racy tests,

I wouldn't want to maintain separate lists at all.  Instead, I'd want
to mark testcases themselves with something like setup_kfail.  Testcases 
already call those depending on target/arch, I see no
difference.

Perhaps a problem with setup_kfail is that that generates a KPASS when
the race doesn't trigger.  We could add a new setup_racy_kfail that
generates a PASS on success and KFAIL on failure, but never a KPASS.
Perhaps we could have a racy_test_scope in the spirit of 
with_test_prefix, which would automatically mark all tests in the scope
as racy, but I'm not sure we do need or want that.

> and even this wouldn't be 100%
> trustworthy because the racy tests also vary depending on the machine
> load...

Varying the load does not make the _set_ of racy tests vary.  It only
varies the chances of a racy test in a set of racy tests actual trigger
the race.  A test in that set is still racy regardless.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] Improve analysis of racy testcases
  2016-03-01  5:51 ` [PATCH v2] " Sergio Durigan Junior
@ 2016-03-01 12:41   ` Pedro Alves
  2016-03-06  1:44     ` Sergio Durigan Junior
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2016-03-01 12:41 UTC (permalink / raw)
  To: Sergio Durigan Junior, GDB Patches

On 03/01/2016 05:51 AM, Sergio Durigan Junior wrote:

>   # Note that we must resort to a recursive make invocation here,
> @@ -190,6 +197,26 @@ DO_RUNTEST = \
>   check-single:
>   	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
>
> +check-single-racy:

(...)

> +	$(srcdir)/analyze-racy-logs.py \
> +	  `ls racy_outputs/gdb*.sum` > racy.sum; \

This ls invocation needs updating.

> +	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
> +


> diff --git a/gdb/testsuite/README b/gdb/testsuite/README
> index 6b59027..043a8bd 100644
> --- a/gdb/testsuite/README
> +++ b/gdb/testsuite/README
> @@ -50,6 +50,51 @@ to any non-empty value:
>   If you want to use runtest directly instead of using the Makefile, see
>   the description of GDB_PARALLEL below.
>
> +Racy testcases
> +**************
> +
> +Sometimes, new testcases are added to the testsuite that are not
> +entirely deterministic, and can randomly pass or fail.  We call them
> +"racy testcases", and they can be bothersome when one is comparing
> +different testsuite runs.  In order to help identifying them, it is
> +possible to run the tests several times in a row and ask the testsuite
> +machinery to analyze the results.  To do that, you need to specify the
> +RACY_ITER environment variable to make:

Nicer, thanks.  :-)


> +
> +    # Print the header.
> +    print "\t\t=== gdb tests ===\n"

"gdb racy tests" ?

> +
> +    # Print each test.
> +    for line in sorted (racy_tests):
> +        print line
> +
> +    # Print the summary.
> +    print "\n"
> +    print "\t\t=== gdb Summary ===\n"
> +    print "# of racy testcases:\t\t%d" % len (racy_tests)

AFAICS, it's # of racy tests, not # of racy testcases, right?

Feel free to push with the above addressed.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-03-01 11:56     ` Pedro Alves
@ 2016-03-06  1:29       ` Sergio Durigan Junior
  0 siblings, 0 replies; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-03-06  1:29 UTC (permalink / raw)
  To: Pedro Alves; +Cc: GDB Patches

On Tuesday, March 01 2016, Pedro Alves wrote:

> On 02/28/2016 09:44 PM, Sergio Durigan Junior wrote:
>> On Thursday, February 25 2016, Pedro Alves wrote:
>
>>
>>> I've now run "make check -j8 RACY_ITER=3" and got this:
>>>
>>> $ cat testsuite/racy.sum
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
>>> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>>>
>
>>> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>>>
>>>   -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>>   +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>>
>>> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.
>>
>> I thought about stripping tail ()s away before comparing the names, but
>> the problem is that maybe we'll miss a test that actually writes
>> something meaningful inside the parentheses.  There isn't a strong
>> convention advising testcase writers to not do that.  What do you think?
>
> I think we should have that convention.  We already largely implicitly
> have it, exactly because of "(PRMS: gdb/NNN)", "(timeout)" or "(eof)",
> etc.
>
> IOW, I think this should be interpreted as a regression in the
> "whatever test" test:
>
>  -PASS: gdb.base/foo.exp: whatever test
>  +FAIL: gdb.base/foo.exp: whatever test (timeout)
>
> If that actually strips something meaningful, I'd just file it under
> the same bucket as non-unique test names, and fix it by tweaking the
> test message.

Alright, makes sense to me.  I went ahead and also updated the wiki with
this rule:

  <https://sourceware.org/gdb/wiki/GDBTestcaseCookbook#Do_not_use_.22tail_parentheses.22_on_test_messages>

>>> The only thing I do wish we should do, is use the fruits of this
>>> to somehow mark racy tests in the testsuite itself, instead of only
>>> making the buildbot ignore them, so that local development benefits
>>> as well.
>>
>> I totally agree, and also spent some time thinking about this problem,
>> but I don't see an easy solution for that.  Racy testcases vary wildly
>> between targets, GDB vs. gdbserver, CFLAGS, etc.
>> We would have to maintain several lists of racy tests,
>
> I wouldn't want to maintain separate lists at all.  Instead, I'd want
> to mark testcases themselves with something like setup_kfail.
> Testcases already call those depending on target/arch, I see no
> difference.
>
> Perhaps a problem with setup_kfail is that that generates a KPASS when
> the race doesn't trigger.  We could add a new setup_racy_kfail that
> generates a PASS on success and KFAIL on failure, but never a KPASS.
> Perhaps we could have a racy_test_scope in the spirit of
> with_test_prefix, which would automatically mark all tests in the
> scope
> as racy, but I'm not sure we do need or want that.

I liked the idea of a setup_racy_kfail (not sure about the
racy_test_scope thing...).  I will take a look at implementing that
later.

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] Improve analysis of racy testcases
  2016-03-01 12:41   ` Pedro Alves
@ 2016-03-06  1:44     ` Sergio Durigan Junior
  2016-03-06 22:34       ` [commit/obvious] Set executable bit on analyze-racy-logs.py Sergio Durigan Junior
  0 siblings, 1 reply; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-03-06  1:44 UTC (permalink / raw)
  To: Pedro Alves; +Cc: GDB Patches

On Tuesday, March 01 2016, Pedro Alves wrote:

> On 03/01/2016 05:51 AM, Sergio Durigan Junior wrote:
>
>>   # Note that we must resort to a recursive make invocation here,
>> @@ -190,6 +197,26 @@ DO_RUNTEST = \
>>   check-single:
>>   	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
>>
>> +check-single-racy:
>
> (...)
>
>> +	$(srcdir)/analyze-racy-logs.py \
>> +	  `ls racy_outputs/gdb*.sum` > racy.sum; \
>
> This ls invocation needs updating.

Fixed.

>> +	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
>> +
>
>
>> diff --git a/gdb/testsuite/README b/gdb/testsuite/README
>> index 6b59027..043a8bd 100644
>> --- a/gdb/testsuite/README
>> +++ b/gdb/testsuite/README
>> @@ -50,6 +50,51 @@ to any non-empty value:
>>   If you want to use runtest directly instead of using the Makefile, see
>>   the description of GDB_PARALLEL below.
>>
>> +Racy testcases
>> +**************
>> +
>> +Sometimes, new testcases are added to the testsuite that are not
>> +entirely deterministic, and can randomly pass or fail.  We call them
>> +"racy testcases", and they can be bothersome when one is comparing
>> +different testsuite runs.  In order to help identifying them, it is
>> +possible to run the tests several times in a row and ask the testsuite
>> +machinery to analyze the results.  To do that, you need to specify the
>> +RACY_ITER environment variable to make:
>
> Nicer, thanks.  :-)
>
>
>> +
>> +    # Print the header.
>> +    print "\t\t=== gdb tests ===\n"
>
> "gdb racy tests" ?

Fixed.

>> +
>> +    # Print each test.
>> +    for line in sorted (racy_tests):
>> +        print line
>> +
>> +    # Print the summary.
>> +    print "\n"
>> +    print "\t\t=== gdb Summary ===\n"
>> +    print "# of racy testcases:\t\t%d" % len (racy_tests)
>
> AFAICS, it's # of racy tests, not # of racy testcases, right?

Yeah, fixed.

> Feel free to push with the above addressed.

Thanks, pushed.

  fb6a751f5f1fe7912e84dd90d06395c357da47c2

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [commit/obvious] Set executable bit on analyze-racy-logs.py
  2016-03-06  1:44     ` Sergio Durigan Junior
@ 2016-03-06 22:34       ` Sergio Durigan Junior
  0 siblings, 0 replies; 15+ messages in thread
From: Sergio Durigan Junior @ 2016-03-06 22:34 UTC (permalink / raw)
  To: GDB Patches; +Cc: Sergio Durigan Junior

I forgot to do it in my previous commit.  This is necessary because we
execute the script directly on gdb/testsuite/Makefile.in.

gdb/testsuite/ChangeLog:
2016-03-06  Sergio Durigan Junior  <sergiodj@redhat.com>

	* analyze-racy-logs.py: Set executable bit.
---
 gdb/testsuite/ChangeLog            | 4 ++++
 gdb/testsuite/analyze-racy-logs.py | 0
 2 files changed, 4 insertions(+)
 mode change 100644 => 100755 gdb/testsuite/analyze-racy-logs.py

diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index a96156d..668f9f0 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2016-03-06  Sergio Durigan Junior  <sergiodj@redhat.com>
+
+	* analyze-racy-logs.py: Set executable bit.
+
 2016-03-05  Sergio Durigan Junior  <sergiodj@redhat.com>
 
 	* Makefile.in (DEFAULT_RACY_ITER): New variable.
diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
old mode 100644
new mode 100755
-- 
2.4.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-03-01  5:58   ` Sergio Durigan Junior
@ 2016-03-14 12:32     ` Antoine Tremblay
  2016-03-14 12:45       ` Pedro Alves
  2016-04-27 17:18       ` Antoine Tremblay
  0 siblings, 2 replies; 15+ messages in thread
From: Antoine Tremblay @ 2016-03-14 12:32 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: Antoine Tremblay, GDB Patches


Sergio Durigan Junior writes:

> On Thursday, February 25 2016, Antoine Tremblay wrote:
>
>> Sergio Durigan Junior writes:
>>
>>> This patch is a proposal to introduce some mechanisms to identify racy
>>> testcases present in our testsuite.  As can be seen in previous
>>> discussions, racy tests are really bothersome and cause our BuildBot to
>>> pollute the gdb-testers mailing list with hundreds of false-positives
>>> messages every month.  Hopefully, by identifying these racy tests in
>>> advance (and automatically) will contribute to the reduction of noise
>>> traffic to gdb-testers, maybe to the point where we will be able to send
>>> the failure messages directly to the authors of the commits.
>>> [...]
>> Thanks for this ! This was quite a problem for me while testing on arm.
>> I'm testing it now...
>
> Thanks!  Please let me know if you find anything wrong with the script.

Just to let know you that the script worked as expected during my
testing and it's quite nice to see. The tests however were more flacky
then expected so I can't use the output directly guess I'll need a never
ignore this test list... (like base/break.exp). Still it's very helpful.

>
>> One note maybe it would be nice output the list of unracy tests too to
>> be able to auto-build a list of tests to run out of this since I'm not
>> sure you can set an exclusion list ?
>
> Hm, it would be possible to output the non-racy tests, but only to a
> different file (instead of outputting to stdout, as I'm doing with the
> racy tests).  Perhaps this could be a separate option to the script?
> I'm not sure if the users would always want this information...
>
> As for the exclusion list you mentioned, DejaGNU's runtest allows one to
> specify a --ignore flag with the names of the tests you don't want to
> run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
> tried.  I'll give it a go tomorrow.

Great, I did not know that, I'll give it a try and that works no need to
output the non-racy tests...

Thanks,
Antoine

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-03-14 12:32     ` Antoine Tremblay
@ 2016-03-14 12:45       ` Pedro Alves
  2016-03-14 14:04         ` Antoine Tremblay
  2016-04-27 17:18       ` Antoine Tremblay
  1 sibling, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2016-03-14 12:45 UTC (permalink / raw)
  To: Antoine Tremblay, Sergio Durigan Junior; +Cc: GDB Patches

On 03/14/2016 12:31 PM, Antoine Tremblay wrote:

> Just to let know you that the script worked as expected during my
> testing and it's quite nice to see. The tests however were more flacky
> then expected so I can't use the output directly guess I'll need a never
> ignore this test list... (like base/break.exp). Still it's very helpful.

Could you share a bit more on what you found?  I wouldn't expect break.exp
to be flaky.  I think some tests that use gdb_test_stdio are still
flaky with gdbserver, but break.exp does not seem to use it.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-03-14 12:45       ` Pedro Alves
@ 2016-03-14 14:04         ` Antoine Tremblay
  0 siblings, 0 replies; 15+ messages in thread
From: Antoine Tremblay @ 2016-03-14 14:04 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Antoine Tremblay, Sergio Durigan Junior, GDB Patches


Pedro Alves writes:

> On 03/14/2016 12:31 PM, Antoine Tremblay wrote:
>
>> Just to let know you that the script worked as expected during my
>> testing and it's quite nice to see. The tests however were more flacky
>> then expected so I can't use the output directly guess I'll need a never
>> ignore this test list... (like base/break.exp). Still it's very helpful.
>
> Could you share a bit more on what you found?  I wouldn't expect break.exp
> to be flaky.  I think some tests that use gdb_test_stdio are still
> flaky with gdbserver, but break.exp does not seem to use it.
>

It's flaky because of a weird problem I have while running GDBServer on
ARM, sometimes out of nowhere I will get a SIGILL just running a program
on GDBServer, clean of breakpoints or anything just loading the program
in GDBServer and doing a continue can trigger it.

See this thread: https://www.sourceware.org/ml/gdb/2015-11/msg00030.html

So it's not because the test itself is flaky but more the platform in general...

Regards,
Antoine

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Improve analysis of racy testcases
  2016-03-14 12:32     ` Antoine Tremblay
  2016-03-14 12:45       ` Pedro Alves
@ 2016-04-27 17:18       ` Antoine Tremblay
  1 sibling, 0 replies; 15+ messages in thread
From: Antoine Tremblay @ 2016-04-27 17:18 UTC (permalink / raw)
  To: Antoine Tremblay; +Cc: Sergio Durigan Junior, GDB Patches


Antoine Tremblay writes:

>> As for the exclusion list you mentioned, DejaGNU's runtest allows one to
>> specify a --ignore flag with the names of the tests you don't want to
>> run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
>> tried.  I'll give it a go tomorrow.
>
> Great, I did not know that, I'll give it a try and that works no need to
> output the non-racy tests...

Just FYI, I just had a chance to try this and it works except you need
to forgo the path of the test so for example to ignore
gdb.trace/actions-changed.exp

You need : RUNTESTFLAGS='--ignore actions-changed.exp'

Not sure what happens if 2 tests have the same name buf a different directory..

Regards,
Antoine

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-27 17:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17  1:26 [PATCH] Improve analysis of racy testcases Sergio Durigan Junior
2016-02-25 17:01 ` Pedro Alves
2016-02-28 21:44   ` Sergio Durigan Junior
2016-03-01 11:56     ` Pedro Alves
2016-03-06  1:29       ` Sergio Durigan Junior
2016-02-25 18:34 ` Antoine Tremblay
2016-03-01  5:58   ` Sergio Durigan Junior
2016-03-14 12:32     ` Antoine Tremblay
2016-03-14 12:45       ` Pedro Alves
2016-03-14 14:04         ` Antoine Tremblay
2016-04-27 17:18       ` Antoine Tremblay
2016-03-01  5:51 ` [PATCH v2] " Sergio Durigan Junior
2016-03-01 12:41   ` Pedro Alves
2016-03-06  1:44     ` Sergio Durigan Junior
2016-03-06 22:34       ` [commit/obvious] Set executable bit on analyze-racy-logs.py Sergio Durigan Junior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).