public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/31069] New: Zombie leader detection racy
@ 2023-11-15 17:51 pedro at palves dot net
  2023-11-15 18:06 ` [Bug gdb/31069] " cvs-commit at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: pedro at palves dot net @ 2023-11-15 17:51 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=31069

            Bug ID: 31069
           Summary: Zombie leader detection racy
           Product: gdb
           Version: unknown
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: pedro at palves dot net
  Target Milestone: ---

Simon noticed that gdb.threads/threads-after-exec.exp was racy.  You can
consistenly reproduce it (at git hash
319b460545dc79280e2904dcc280057cf71fb753), with:

  $ taskset -c 0 make check TESTS="gdb.threads/threads-after-exec.exp"

This is yet another case of zombie leader detection making things a bit fuzzy.

In the passing case, we have:

 continue
 Continuing.
 [New Thread 0x7ffff7bff640 (LWP 603183)]
 [Thread 0x7ffff7bff640 (LWP 603183) exited]
 process 603180 is executing new program:
.../gdb.threads/threads-after-exec/threads-after-exec

While in the failing case, we have (note remarks on the rhs):

 continue
 Continuing.
 [New Thread 0x7ffff7bff640 (LWP 600205)]
 [Thread 0x7ffff7f95740 (LWP 600202) exited]   <<< gdb deletes leader thread,
thread 1.
 [New LWP 600202]                              <<< gdb adds it back -- this is
now thread 3.
 [Thread 0x7ffff7bff640 (LWP 600205) exited]
 process 600202 is executing new program:
.../threads-after-exec/threads-after-exec
 [Switching to process 600202]
 Thread 3 "threads-after-e" hit Catchpoint 2 (exec'd
.../gdb.threads/threads-after-exec/threads-after-exec), 0x00007ffff7fe3290
  in _start () from /lib64/ld-linux-x86-64.so.2

The testcase only has two threads, yet GDB presented the exec for thread 3. 
This is GDB deleting the leader (the backend detected it was zombie, due to the
exec), and then added it back when it saw the exec event.  The testcase isn't
expecting that the remaining thread after the exec is any other than thread 1.

I'm not sure there's anything we can do easily do on the gdb side.  Recreating
the leader thread is one option, but I'm not fully sure of the consequences,
like e.g., the previous thread 1 will probably still exist in the thread list
as THREAD_EXITED, if it was the selected thread.

Maybe we can make use of PTRACE_O_TRACEEXIT / PTRACE_EVENT_EXIT, and model a
"zombie" state in the core, so if the leader exits, we keep listing it, but GDB
wouldn't try to stop that thread or read its registers.  After an exec, the
zombie thread would go back to being a normal thread.  The next question would
be how to model this in the remote protocol.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug gdb/31069] Zombie leader detection racy
  2023-11-15 17:51 [Bug gdb/31069] New: Zombie leader detection racy pedro at palves dot net
@ 2023-11-15 18:06 ` cvs-commit at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-15 18:06 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=31069

--- Comment #1 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d2eca84d73a66cf93acbf14522efc835e4446f57

commit d2eca84d73a66cf93acbf14522efc835e4446f57
Author: Pedro Alves <pedro@palves.net>
Date:   Tue Nov 14 11:47:15 2023 +0000

    Fix gdb.threads/threads-after-exec.exp race

    Simon noticed that gdb.threads/threads-after-exec.exp was racy.  You
    can consistenly reproduce it (at git hash
    319b460545dc79280e2904dcc280057cf71fb753), with:

      $ taskset -c 0 make check TESTS="gdb.threads/threads-after-exec.exp"

    gdb.log shows:

      (...)
      Thread 3 "threads-after-e" hit Catchpoint 2 (exec'd
.../gdb.threads/threads-after-exec/threads-after-exec), 0x00007ffff7fe3290
       in _start () from /lib64/ld-linux-x86-64.so.2
      (gdb) PASS: gdb.threads/threads-after-exec.exp: continue until exec
      info threads
        Id   Target Id                         Frame
      * 3    process 1443269 "threads-after-e" 0x00007ffff7fe3290 in _start ()
from /lib64/ld-linux-x86-64.so.2
      (gdb) FAIL: gdb.threads/threads-after-exec.exp: info threads
      (...)
      maint info linux-lwps
      LWP Ptid          Thread ID
      1443269.1443269.0 1.3
      (gdb) FAIL: gdb.threads/threads-after-exec.exp: maint info linux-lwps

    The FAILs happen because the .exp file expects that after the exec,
    the only thread has GDB thread number 1, but it has instead 3.

    This is yet another case of zombie leader detection making things a
    bit fuzzy.

    In the passing case, we have:

     continue
     Continuing.
     [New Thread 0x7ffff7bff640 (LWP 603183)]
     [Thread 0x7ffff7bff640 (LWP 603183) exited]
     process 603180 is executing new program:
.../gdb.threads/threads-after-exec/threads-after-exec

    While in the failing case, we have (note remarks on the rhs):

     continue
     Continuing.
     [New Thread 0x7ffff7bff640 (LWP 600205)]
     [Thread 0x7ffff7f95740 (LWP 600202) exited]   <<< gdb deletes leader
thread, thread 1.
     [New LWP 600202]                              <<< gdb adds it back -- this
is now thread 3.
     [Thread 0x7ffff7bff640 (LWP 600205) exited]
     process 600202 is executing new program:
.../threads-after-exec/threads-after-exec

    The testcase only has two threads, yet GDB presented the exec for
    thread 3.  This is GDB deleting the leader (the backend detected it
    was zombie, due to the exec), and then adding the leader back when it
    saw the exec event.

    I've recorded some thoughts about this in PR gdb/31069.

    For now, this commit just makes the testcase cope with the non-one
    thread number, as the number is not important for what this test is
    exercising.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31069
    Change-Id: Id80b5c73f09c9e0005efeb494cca5d066ac3bbae

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-11-15 18:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-15 17:51 [Bug gdb/31069] New: Zombie leader detection racy pedro at palves dot net
2023-11-15 18:06 ` [Bug gdb/31069] " cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).