From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DFDE6385E451; Sat, 16 Mar 2024 01:37:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DFDE6385E451 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1710553049; bh=PsQfnP3MVldn846ZgeSWSNUe1iVxBqOFFd51qNl0GmM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ynUXYxSEplZkQ7Eu7Mo7be9hrZN2xLwTgxKbNhtmC+66qblHSVmygcY3iL3/oeYou H2pJKhhlGAYvGGvfzstGVHd9zwzBlcsCUaioVLlH8YlnJSrSgqa5v7RSSzVyIIAS+b kSQDQbDU/0Idt/jGRwG4Bi5V9j2bvnX/lJDyproo= From: "thiago.bauermann at linaro dot org" To: gdb-prs@sourceware.org Subject: [Bug testsuite/31312] attach-many-short-lived-threads gives inconsistent results Date: Sat, 16 Mar 2024 01:37:28 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: testsuite X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: thiago.bauermann at linaro dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: cel at linux dot ibm.com X-Bugzilla-Target-Milestone: 15.1 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31312 --- Comment #17 from Thiago Jung Bauermann --- Created attachment 15405 --> https://sourceware.org/bugzilla/attachment.cgi?id=3D15405&action=3Ded= it Shameless workaround. I don't have any certainty yet, but I have a few suspicions, and the attach= ed workaround... I believe (still have to confirm this) that when GDB is stuck using 100% of= the CPU, it's here in linux_proc_attach_tgid_threads (): /* Scan the task list for existing threads. While we go through the threads, new threads may be spawned. Cycle through the list of threads until we have done two iterations without finding new threads. */ for (iterations =3D 0; iterations < 20; iterations++) { struct dirent *dp; new_threads_found =3D 0; while ((dp =3D readdir (dir.get ())) !=3D NULL) { unsigned long lwp; /* Fetch one lwp. */ lwp =3D strtoul (dp->d_name, NULL, 10); if (lwp !=3D 0) { ptid_t ptid =3D ptid_t (pid, lwp); if (attach_lwp (ptid)) new_threads_found =3D 1; } } if (new_threads_found) { /* Start over. */ iterations =3D -1; } rewinddir (dir.get ()); } In this case, the attach_lwp function pointer being called is attach_proc_task_lwp_callback (), and the relevant part of it is: if (ptrace (PTRACE_ATTACH, lwpid, 0, 0) < 0) { int err =3D errno; /* Be quiet if we simply raced with the thread exiting. EPERM is returned if the thread's task still exists, and is marked as exited or zombie, as well as other conditions, so in that case, confirm the status in /proc/PID/status. */ if (err =3D=3D ESRCH || (err =3D=3D EPERM && linux_proc_pid_is_gone (lwpid))) { linux_nat_debug_printf ("Cannot attach to lwp %d: thread is gone (%d: %s)", lwpid, err, safe_strerror (err)); } So this is what I think is going on (again, I still need to confirm): 1. linux_proc_attach_tgid_threads () loops through tasks in /proc/PID/task, calling attach_proc_task_lwp_callback () on each of them. 2. ptrace (PTRACE_ATTACH) returns -1 with errno =3D EPERM, causing linux_proc_pid_is_gone () to get called. 3. linux_proc_pid_is_gone () opens /proc/LWP/status and sees that the thread state is zombie or dead. 4. attach_proc_task_lwp_callback () returns 1, indicating that a new thread= was found. 5. linux_proc_attach_tgid_threads () sets new_threads_found =3D 1 and loops again, finding the same thread in /proc/PID/task again because for some rea= son the kernel isn't removing its proc entry any time soon. 6. GOTO 1. So my suspicion is that what is confusing GDB is that the kernel (probably! have to confirm...) is keeping the /proc entry for zombie and dead threads around indefinitely. Anyway, regarding the workaround: it's not very satisfying because increasi= ng the number of iterations in linux_proc_attach_tgid_threads () goes back to = the heuristic that Pedro's commit 8784d56326e7 ("Linux: on attach, attach to lw= ps listed under /proc/$pid/task/") removed. Not increasing it makes GDB leave = some threads unattached and the inferior dies with a SIGTRAP due to the breakpoi= nt (which is exactly the scenario the testcase is designed to catch). Using 20 still triggers the problem relatively easily for me, after 100 tries of run= ning the testcase in a loop. --=20 You are receiving this mail because: You are on the CC list for the bug.=