[Bug threads/26286] FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break_fn: 1 (SIGTRAP)

public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed

From: "thiago.bauermann at linaro dot org" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug threads/26286] FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break at break_fn: 1 (SIGTRAP)
Date: Tue, 19 Mar 2024 17:44:59 +0000	[thread overview]
Message-ID: <bug-26286-4717-DbglqoNIix@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-26286-4717@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=26286

Thiago Jung Bauermann <thiago.bauermann at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thiago.bauermann at linaro dot org

--- Comment #5 from Thiago Jung Bauermann <thiago.bauermann at linaro dot org> ---
I also encountered this issue with current master branch on 3 machines: two
x86_64-linux and one aarch64-linux. Carl Love also reported in bug #31312
that he encountered the issue on a powerpc64le-linux system. The aarch64
and powerpc64le machines had a patch to fix bug #31312 applied.

In all cases it's necessary to keep running
attach-many-short-lived-threads.exp in a loop to reproduce the problem. In
one of the x86_64-linux machines it takes anywhere from 30 to 500
iterations to hit the problem, while in the other it took between 120 and
900 iterations. The aarch64-linux machine took ~2500 iterations. In
powerpc64le-linux, the problem happened in 3 iterations out of 500.

I do have Yama present in those machines, but it is disabled in all of
them:

  $ sysctl kernel.yama.ptrace_scope
  kernel.yama.ptrace_scope = 0

I also agree with Tom that if Yama was the problem, it would affect the
testcase in a different way.

The issue arises from this loop in linux_proc_attach_tgid_threads ():

  /* Scan the task list for existing threads.  While we go through the
     threads, new threads may be spawned.  Cycle through the list of
     threads until we have done two iterations without finding new
     threads.  */
  for (iterations = 0; iterations < 2; iterations++)
    {
      struct dirent *dp;

      new_threads_found = 0;
      while ((dp = readdir (dir.get ())) != NULL)
        {
          unsigned long lwp;

          /* Fetch one lwp.  */
          lwp = strtoul (dp->d_name, NULL, 10);
          if (lwp != 0)
            {
              ptid_t ptid = ptid_t (pid, lwp);

              if (attach_lwp (ptid))
                new_threads_found = 1;
            }
        }

      if (new_threads_found)
        {
          /* Start over.  */
          iterations = -1;
        }

      rewinddir (dir.get ());
    }

What happens is that two iterations without seeing new threads in
linux_proc_attach_tgid_threads () isn't always enough for GDB to know that
it has attached to all inferior threads. So sometimes after this function
returns, an unattached inferior thread trips on the breakpoint instruction
that GDB put in the inferior.

I don't know if I would consider this a bug, but rather an issue that
arises from the way attach-many-short-lived-threads.c behaves: since it's
constantly creating new threads it's impossible for GDB to know when it has
attached to all of them so that it can finish looking for new threads to
attach.

The only way I can see to improve GDB's behaviour is to increase the number
of iterations of the loop that checks for new threads.

I suspected that the ability of the inferior to create new threads was
proportional to the number of CPUs present in the system so I was going to
make the number of iterations in linux_proc_attach_tgid_threads ()
proportional to the number of CPUS, but on the machines I have at hand, the
one where it takes longest to reproduce the problem has the most CPUs (160,
vs 8 CPUs on the other machines), so maybe we just have to find a magical
iteration number that works well for everybody who can reproduce the issue?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2024-03-19 17:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 19:08 [Bug threads/26286] New: " vries at gcc dot gnu.org
2020-07-22 19:08 ` [Bug threads/26286] " vries at gcc dot gnu.org
2020-07-27 14:17 ` vries at gcc dot gnu.org
2020-08-31  7:54 ` ianchi at andestech dot com
2020-08-31  8:30 ` vries at gcc dot gnu.org
2024-03-19 17:44 ` thiago.bauermann at linaro dot org [this message]
2024-05-17 18:08 ` pedro at palves dot net

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-26286-4717-DbglqoNIix@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=gdb-prs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).