From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A05783858C53; Tue, 6 Feb 2024 18:59:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A05783858C53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1707245988; bh=PzEvcJhFtGgX4EYJYdiik6YtsBMuXykfo9oPJFPGAAc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=JyW0LfPPkjjfTBXL2ZQviKrtt5Y2PiQhxA087csw3L4mbnzQWZEIcuATwUI/pTZNk niZ0VtPp/br+m9+5px/xwy2qyS1rx3EzsU5K6nY3nzNcobWSHrVd7jBBYAWN+0qifH VCFq+GBVmxGMacLvVXptAuwmp30ATzgfrmAa0EbA= From: "cel at linux dot ibm.com" To: gdb-prs@sourceware.org Subject: [Bug testsuite/31312] attach-many-short-lived-threads gives inconsistent results Date: Tue, 06 Feb 2024 18:59:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: testsuite X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cel at linux dot ibm.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31312 --- Comment #9 from Carl E Love --- I spent time playing with changing the timeout with no luck. So, looking m= ore carefully at the test and the output,... In the expect script we have the following check: gdb_test_multiple "attach $testpid" $test { ... -re "Cannot attach to lwp $decimal: Operation not permitted= " { # On Linux, PTRACE_ATTACH sometimes fails with=20=20=20= =20=20=20=20=20=20 # EPERM, even though /proc/PID/status indicates=20=20= =20=20=20=20=20=20 # the thread is running.=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 set eperm 1 exp_continue } ... -re "$gdb_prompt $" { if {$eperm} { xfail "$test (EPERM)" } else { pass $test } When I look at the log file, with some additional put statements to print t= he testpid, I see that once we hit the above result, we do an XFAIL. Then the test loops around and tries to do the attach for the same testpid again. T= his time it times out and all the rest of the tests end up timing out for all of the remaining iterations.=20=20 >From the comment in the test, it seems to imply that the test expects the situation to be transient and the next attempt to attach should succeed. W= ell, that doesn't seem to be the case, at least for Power 10. So, it seems we ne= ed to "fix" the handling for this error? A few possibilities come to mind, 1) just exit the test on this failure; 2)= try sleeping a little in the hope that the "issue" will clear up and the next attach will succeed; 3) get a new testpid and continue the test. 1) I am not really excited by this option in that if the failure occurred = on the first iteration then we really haven't tested things properly. 2) I tried putting a sleep 1 in before the exp_continue. Unfortunately, t= hat didn't fix things. In one case, I to messages on a subsequent iteration th= at the "program is no longer running". In another case, things just timed out= as before. 3) This option basically throws out the problem testid and gets a new one.= I tried this with the following change to the test: diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp b/gdb /testsuite/gdb.threads/attach-many-short-lived-threads.exp index 872473aa550..2b5c80e4323 100644 --- a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp +++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp @@ -87,6 +87,15 @@ proc test {} { -re "$gdb_prompt $" { if {$eperm} { xfail "$test (EPERM)" + # The attach failed. No point in doing the rest + # of the tests since we are not attached? So + # should we either 1) exit the test; or 2) + # try again with a new testpid? + puts "CARLL, xfail EPERM, testpid $testpid" + + # Try a new process + set test_spawn_id [spawn_wait_for_attach $binfile] + set testpid [spawn_id_get_pid $test_spawn_id] } else { pass $test } With this test we can complete all the test iterations but with different testpids. Output from the modified test for one of my test runs: Running target unix Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for targ= et. Using /home/carll/GDB/build-current/gdb/testsuite/../../../binutils-gdb-current/g= db/testsuite/config/unix\ .exp as tool-and-target-specific interface file. Running /home/carll/GDB/build-current/gdb/testsuite/../../../binutils-gdb-current/g= db/testsuite/gdb.threa\ ds/attach-many-short-lived-threads.exp ... CARLL, timeout =3D 10 CARLL, run test on testpid =3D 3050726 CARLL, attempt =3D 1 CARLL, attempt =3D 2 CARLL, EPERM failue testpid =3D 3050726, attempt =3D 2 CARLL, xfail EPERM, testpid 3050726 CARLL, attempt =3D 3 CARLL, EPERM failue testpid =3D 3102706, attempt =3D 3 CARLL, xfail EPERM, testpid 3102706 CARLL, attempt =3D 4 CARLL, attempt =3D 5 CARLL, attempt =3D 6 CARLL, attempt =3D 7 CARLL, attempt =3D 8 CARLL, attempt =3D 9 CARLL, attempt =3D 10 =3D=3D=3D gdb Summary =3D=3D=3D # of expected passes 87 # of expected failures 2 This fixes the failures on Power 10. We still don't know the underlying re= ason for the EPERM failure in the first place. All we do is abandon that pid and continue with a new one.=20=20 Any thoughts of other ways to handle the case of EPERM failure? Is there a better solution? Thoughts? --=20 You are receiving this mail because: You are on the CC list for the bug.=