From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
	id A05783858C53; Tue,  6 Feb 2024 18:59:48 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A05783858C53
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1707245988;
	bh=PzEvcJhFtGgX4EYJYdiik6YtsBMuXykfo9oPJFPGAAc=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=JyW0LfPPkjjfTBXL2ZQviKrtt5Y2PiQhxA087csw3L4mbnzQWZEIcuATwUI/pTZNk
	 niZ0VtPp/br+m9+5px/xwy2qyS1rx3EzsU5K6nY3nzNcobWSHrVd7jBBYAWN+0qifH
	 VCFq+GBVmxGMacLvVXptAuwmp30ATzgfrmAa0EbA=
From: "cel at linux dot ibm.com" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug testsuite/31312] attach-many-short-lived-threads gives
 inconsistent results
Date: Tue, 06 Feb 2024 18:59:47 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gdb
X-Bugzilla-Component: testsuite
X-Bugzilla-Version: HEAD
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cel at linux dot ibm.com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-31312-4717-HyhsF5YeJ0@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-31312-4717@http.sourceware.org/bugzilla/>
References: <bug-31312-4717@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gdb-prs.sourceware.org>

https://sourceware.org/bugzilla/show_bug.cgi?id=3D31312

--- Comment #9 from Carl E Love <cel at linux dot ibm.com> ---
I spent time playing with changing the timeout with no luck.  So, looking m=
ore
carefully at the test and the output,...  In the expect script we have the
following check:

            gdb_test_multiple "attach $testpid" $test {
 ...
                -re "Cannot attach to lwp $decimal: Operation not permitted=
" {
                    # On Linux, PTRACE_ATTACH sometimes fails with=20=20=20=
=20=20=20=20=20=20
                    # EPERM, even though /proc/PID/status indicates=20=20=
=20=20=20=20=20=20
                    # the thread is running.=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
                    set eperm 1
                    exp_continue
                }
...
                -re "$gdb_prompt $" {
                    if {$eperm} {
                        xfail "$test (EPERM)"
                    } else {
                        pass $test
                    }


When I look at the log file, with some additional put statements to print t=
he
testpid, I see that once we hit the above result, we do an XFAIL.  Then the
test loops around and tries to do the attach for the same testpid again.  T=
his
time it times out and all the rest of the tests end up timing out for all of
the remaining iterations.=20=20

>From the comment in the test, it seems to imply that the test expects the
situation to be transient and the next attempt to attach should succeed.  W=
ell,
that doesn't seem to be the case, at least for Power 10. So, it seems we ne=
ed
to "fix" the handling for this error?

A few possibilities come to mind, 1) just exit the test on this failure; 2)=
 try
sleeping a little in the hope that the "issue" will clear up and the next
attach will succeed; 3) get a new testpid and continue the test.

1)  I am not really excited by this option in that if the failure occurred =
on
the first iteration then we really haven't tested things properly.

2)  I tried putting a sleep 1 in before the exp_continue.  Unfortunately, t=
hat
didn't fix things.  In one case, I to messages on a subsequent iteration th=
at
the "program is no longer running".  In another case, things just timed out=
 as
before.

3)  This option basically throws out the problem testid and gets a new one.=
  I
tried this with the following change to the test:
diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
b/gdb
/testsuite/gdb.threads/attach-many-short-lived-threads.exp
index 872473aa550..2b5c80e4323 100644
--- a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
@@ -87,6 +87,15 @@ proc test {} {
                -re "$gdb_prompt $" {
                    if {$eperm} {
                        xfail "$test (EPERM)"
+                       # The attach failed.  No point in doing the rest
+                       # of the tests since we are not attached?  So
+                       # should we either 1) exit the test; or 2)
+                       # try again with a new testpid?
+                       puts "CARLL, xfail EPERM, testpid $testpid"
+
+                       # Try a new process
+                       set test_spawn_id [spawn_wait_for_attach $binfile]
+                       set testpid [spawn_id_get_pid $test_spawn_id]
                    } else {
                        pass $test
                    }

With this test we can complete all the test iterations but with different
testpids.  Output from the modified test for one of my test runs:

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for targ=
et.
Using
/home/carll/GDB/build-current/gdb/testsuite/../../../binutils-gdb-current/g=
db/testsuite/config/unix\
.exp as tool-and-target-specific interface file.
Running
/home/carll/GDB/build-current/gdb/testsuite/../../../binutils-gdb-current/g=
db/testsuite/gdb.threa\
ds/attach-many-short-lived-threads.exp ...
CARLL, timeout =3D 10
CARLL, run test on testpid =3D 3050726
CARLL, attempt =3D 1
CARLL, attempt =3D 2
CARLL, EPERM failue testpid =3D 3050726, attempt =3D 2
CARLL, xfail EPERM, testpid 3050726
CARLL, attempt =3D 3
CARLL, EPERM failue testpid =3D 3102706, attempt =3D 3
CARLL, xfail EPERM, testpid 3102706
CARLL, attempt =3D 4
CARLL, attempt =3D 5
CARLL, attempt =3D 6
CARLL, attempt =3D 7
CARLL, attempt =3D 8
CARLL, attempt =3D 9
CARLL, attempt =3D 10

                =3D=3D=3D gdb Summary =3D=3D=3D

# of expected passes            87
# of expected failures          2


This fixes the failures on Power 10.  We still don't know the underlying re=
ason
for the EPERM failure in the first place.  All we do is abandon that pid and
continue with a new one.=20=20

Any thoughts of other ways to handle the case of EPERM failure?  Is there a
better solution?  Thoughts?

--=20
You are receiving this mail because:
You are on the CC list for the bug.=