From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 106107 invoked by alias); 12 Sep 2016 14:58:10 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 105248 invoked by uid 48); 12 Sep 2016 14:57:57 -0000 From: "dsmith at redhat dot com" To: systemtap@sourceware.org Subject: [Bug testsuite/20600] New: parallet testsuite hang in [nd_]syscall.exp Date: Mon, 12 Sep 2016 14:58:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: testsuite X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dsmith at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2016-q3/txt/msg00284.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D20600 Bug ID: 20600 Summary: parallet testsuite hang in [nd_]syscall.exp Product: systemtap Version: unspecified Status: NEW Severity: normal Priority: P2 Component: testsuite Assignee: systemtap at sourceware dot org Reporter: dsmith at redhat dot com Target Milestone: --- When I run the testsuite in parallel mode with at lest 3 concurrent jobs, I= 'm getting a testsuite "hang". The testsuite will run to completion, except for either the syscall.exp or nd_syscall.exp test case. That test case will han= g in one of the tests, typically in the execve or getrlimit subtest. The stapio process for that test is in the defunct state: =3D=3D=3D=3D # ps ax | fgrep stap 14534 pts/0 S+ 0:00 grep -F --color=3Dauto stap 24933 ? Zl 0:10 [stapio] # tail testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log=20 Executing on host: gcc /root/src/testsuite/systemtap.syscall/getpriority.c= =20 -lrt -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= stgbSi0f/getpriority (timeout =3D 300) spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getpriority.c -lrt -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= stgbSi0f/getpriority PASS: 64-bit getpriority nd_syscall Testing 64-bit getrandom nd_syscall Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrandom.c -= lrt=20 -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= st9QHupy/getrandom (timeout =3D 300) spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrandom.c = -lrt -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= st9QHupy/getrandom PASS: 64-bit getrandom nd_syscall Testing 64-bit getrlimit nd_syscall Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrlimit.c -= lrt=20 -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= st4a2xe9/getrlimit (timeout =3D 300) spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrlimit.c = -lrt -lm -o /root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/stapte= st4a2xe9/getrlimit # ll testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log=20 -rwxr-xr-x. 1 root root 21289 Sep 10 01:19 testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.lo =3D=3D=3D=3D So, for over 9 hours that test has just sat there. If I do a 'kill -9' on t= hat defunct stapio process, the [nd_]syscall.exp test will finish (and the full testsuite will also finish). Note that on the same system the full testsuite (and the [nd_]syscall.exp t= est cases) will run to completion when run in non-parallel mode. This "hang" is fairly repeatable, happening at least 50% of the time. I'd guess that one of the other tests is interfering with the [nd_]syscall.= exp test case somehow, but I can't quite think of how. --=20 You are receiving this mail because: You are the assignee for the bug.