public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
From: "thiago.bauermann at linaro dot org" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug testsuite/31312] attach-many-short-lived-threads gives inconsistent results
Date: Sat, 16 Mar 2024 01:37:28 +0000	[thread overview]
Message-ID: <bug-31312-4717-wCCraU5fd2@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-31312-4717@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=31312

--- Comment #17 from Thiago Jung Bauermann <thiago.bauermann at linaro dot org> ---
Created attachment 15405
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15405&action=edit
Shameless workaround.

I don't have any certainty yet, but I have a few suspicions, and the attached
workaround...

I believe (still have to confirm this) that when GDB is stuck using 100% of the
CPU, it's here in linux_proc_attach_tgid_threads ():

  /* Scan the task list for existing threads.  While we go through the
     threads, new threads may be spawned.  Cycle through the list of
     threads until we have done two iterations without finding new
     threads.  */
  for (iterations = 0; iterations < 20; iterations++)
    {
      struct dirent *dp;

      new_threads_found = 0;
      while ((dp = readdir (dir.get ())) != NULL)
        {
          unsigned long lwp;

          /* Fetch one lwp.  */
          lwp = strtoul (dp->d_name, NULL, 10);
          if (lwp != 0)
            {
              ptid_t ptid = ptid_t (pid, lwp);

              if (attach_lwp (ptid))
                new_threads_found = 1;
            }
        }

      if (new_threads_found)
        {
          /* Start over.  */
          iterations = -1;
        }

      rewinddir (dir.get ());
    }

In this case, the attach_lwp function pointer being called is
attach_proc_task_lwp_callback (), and the relevant part of it is:

  if (ptrace (PTRACE_ATTACH, lwpid, 0, 0) < 0)
    {
      int err = errno;

      /* Be quiet if we simply raced with the thread exiting.
         EPERM is returned if the thread's task still exists, and
         is marked as exited or zombie, as well as other
         conditions, so in that case, confirm the status in
         /proc/PID/status.  */
      if (err == ESRCH
          || (err == EPERM && linux_proc_pid_is_gone (lwpid)))
        {
          linux_nat_debug_printf
            ("Cannot attach to lwp %d: thread is gone (%d: %s)",
             lwpid, err, safe_strerror (err));
        }

So this is what I think is going on (again, I still need to confirm):

1. linux_proc_attach_tgid_threads () loops through tasks in /proc/PID/task,
calling attach_proc_task_lwp_callback () on each of them.

2. ptrace (PTRACE_ATTACH) returns -1 with errno = EPERM, causing
linux_proc_pid_is_gone () to get called.

3. linux_proc_pid_is_gone () opens /proc/LWP/status and sees that the thread
state is zombie or dead.

4. attach_proc_task_lwp_callback () returns 1, indicating that a new thread was
found.

5. linux_proc_attach_tgid_threads () sets new_threads_found = 1 and loops
again, finding the same thread in /proc/PID/task again because for some reason
the kernel isn't removing its proc entry any time soon.

6. GOTO 1.

So my suspicion is that what is confusing GDB is that the kernel (probably!
have to confirm...) is keeping the /proc entry for zombie and dead threads
around indefinitely.

Anyway, regarding the workaround: it's not very satisfying because increasing
the number of iterations in linux_proc_attach_tgid_threads () goes back to the
heuristic that Pedro's commit 8784d56326e7 ("Linux: on attach, attach to lwps
listed under /proc/$pid/task/") removed. Not increasing it makes GDB leave some
threads unattached and the inferior dies with a SIGTRAP due to the breakpoint
(which is exactly the scenario the testcase is designed to catch). Using 20
still triggers the problem relatively easily for me, after 100 tries of running
the testcase in a loop.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2024-03-16  1:37 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 18:06 [Bug testsuite/31312] New: " cel at linux dot ibm.com
2024-01-29 18:08 ` [Bug testsuite/31312] " cel at linux dot ibm.com
2024-01-29 18:20 ` tromey at sourceware dot org
2024-01-29 20:55 ` vries at gcc dot gnu.org
2024-01-29 21:35 ` cel at linux dot ibm.com
2024-01-29 21:44 ` cel at linux dot ibm.com
2024-01-29 22:38 ` cel at linux dot ibm.com
2024-01-30  7:21 ` vries at gcc dot gnu.org
2024-01-30 10:13 ` vries at gcc dot gnu.org
2024-01-31 16:14 ` cel at linux dot ibm.com
2024-02-06 18:59 ` cel at linux dot ibm.com
2024-02-12 18:58 ` tromey at sourceware dot org
2024-02-12 18:59 ` tromey at sourceware dot org
2024-02-16  4:42 ` cel at linux dot ibm.com
2024-03-09  0:45 ` tromey at sourceware dot org
2024-03-09  1:29 ` cel at linux dot ibm.com
2024-03-09  6:59 ` brobecker at gnat dot com
2024-03-09 16:43 ` tromey at sourceware dot org
2024-03-15 16:41 ` cel at linux dot ibm.com
2024-03-15 21:57 ` thiago.bauermann at linaro dot org
2024-03-16  1:37 ` thiago.bauermann at linaro dot org [this message]
2024-03-16 17:42 ` tromey at sourceware dot org
2024-03-18 18:45 ` thiago.bauermann at linaro dot org
2024-03-19 15:14 ` cel at linux dot ibm.com
2024-03-19 15:35 ` thiago.bauermann at linaro dot org
2024-03-19 15:57 ` cel at linux dot ibm.com
2024-03-19 19:10 ` thiago.bauermann at linaro dot org
2024-03-21 23:17 ` thiago.bauermann at linaro dot org
2024-04-14 17:56 ` brobecker at gnat dot com
2024-04-16  4:56 ` thiago.bauermann at linaro dot org
2024-04-17 14:52 ` pedro at palves dot net
2024-04-30  2:37 ` cvs-commit at gcc dot gnu.org
2024-05-10 22:14 ` brobecker at gnat dot com
2024-05-10 22:28 ` cel at linux dot ibm.com
2024-05-11 23:48 ` thiago.bauermann at linaro dot org
2024-05-13 19:03 ` tromey at sourceware dot org
2024-05-14 15:24 ` cel at linux dot ibm.com
2024-05-17 16:26 ` tromey at sourceware dot org
2024-05-17 16:33 ` cel at linux dot ibm.com
2024-05-17 17:10 ` vries at gcc dot gnu.org
2024-05-17 19:54 ` cel at linux dot ibm.com
2024-05-17 19:58 ` pedro at palves dot net
2024-05-17 23:02 ` cel at linux dot ibm.com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31312-4717-wCCraU5fd2@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=gdb-prs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).