[PATCH 0/2] Exit during detach

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Exit during detach
@ 2023-07-18 14:31 Andrew Burgess
  2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
  2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-07-18 14:31 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Ran into an issue in non-stop mode where an exit arriving during a
detach would cause GDB to crash.

---

Andrew Burgess (2):
  gdb: fix possible nullptr dereference in a remote_debug_printf call
  gdb: handle main thread exiting during detach

 gdb/linux-nat.c                               |  19 ++-
 gdb/remote.c                                  |  23 ++-
 .../main-thread-exit-during-detach.c          |  59 ++++++++
 .../main-thread-exit-during-detach.exp        | 143 ++++++++++++++++++
 gdbserver/mem-break.cc                        |   7 +
 5 files changed, 241 insertions(+), 10 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp


base-commit: b26b06dd42fbd9a75eebb4c943bf55a88562b81f
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call
  2023-07-18 14:31 [PATCH 0/2] Exit during detach Andrew Burgess
@ 2023-07-18 14:31 ` Andrew Burgess
  2023-07-19 15:01   ` Tom Tromey
  2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-07-18 14:31 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

While working on the next patch I triggered a segfault from within the
function remote_target::discard_pending_stop_replies.  Turns out this
was caused by a cut&paste error introduced in this commit:

  commit df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78
  Date:   Wed Dec 1 09:40:03 2021 -0500

      gdb, gdbserver: detach fork child when detaching from fork parent

This commit adds a remote_debug_printf call that was copied from
earlier in the function, however, the new call wasn't updated to use
the appropriate local variable.  The local variable that it is using
might be nullptr, in which case we trigger undefined behaviour, and
could crash, which is what I was seeing.

Fixed by updating to use the correct local variable.
---
 gdb/remote.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 7e3d6adfe4f..ff3d7e5cd32 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -7564,8 +7564,8 @@ remote_target::discard_pending_stop_replies (struct inferior *inf)
   for (auto it = iter; it != rs->stop_reply_queue.end (); ++it)
     remote_debug_printf
       ("discarding queued stop reply: ptid: %s, ws: %s\n",
-       reply->ptid.to_string().c_str(),
-       reply->ws.to_string ().c_str ());
+       (*it)->ptid.to_string().c_str(),
+       (*it)->ws.to_string ().c_str ());
   rs->stop_reply_queue.erase (iter, rs->stop_reply_queue.end ());
 }
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2] gdb: handle main thread exiting during detach
  2023-07-18 14:31 [PATCH 0/2] Exit during detach Andrew Burgess
  2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
@ 2023-07-18 14:31 ` Andrew Burgess
  2023-07-18 14:40   ` Andrew Burgess
  2023-08-03  9:50   ` [PATCHv2] " Andrew Burgess
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-07-18 14:31 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Overview
========

In non-stop mode, the main thread is running while a second thread is
stopped.  The user has the second thread selected as the current
thread and asks GDB to detach.  At the exact moment of detach the main
thread exits.  This situation causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets,
but the fixes are all different.

Native Linux Target
===================

This is caught in linux_nat_target::detach and the stop_wait_callback
notices that the thread has exited and cleans up.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have been detached.  I fix this by changing the
assert too:

  gdb_assert (num_lwps (pid) <= 1);

and improving the comment.

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr, however, as
the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr,
and so we crash trying to dereference the nullptr.  I fix this by
adding a nullptr check for main_lwp.  I think this is fine because the
actions we are skipping all relate to detaching from the main thread,
but we'll not be doing this as the thread has already exited!

For Remote Targets
==================

On remote targets there are two problems.  The first is that when the
exit occurs during the early phase of the detach, we see the stop
notification arrive while GDB is removing the breakpoints ahead of the
detach.  The 'set debug remote on' trace looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.

It seems reasonable that in this case gdbserver should report a
failure to remove the breakpoint, and this is what I've done -- by
adding a nullptr check within find_gdb_breakpoint.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error should be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.
---
 gdb/linux-nat.c                               |  19 ++-
 gdb/remote.c                                  |  19 ++-
 .../main-thread-exit-during-detach.c          |  59 ++++++++
 .../main-thread-exit-during-detach.exp        | 143 ++++++++++++++++++
 gdbserver/mem-break.cc                        |   7 +
 5 files changed, 239 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 73008a313c0..c3d62e0f85e 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1426,10 +1426,11 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
-  /* Only the initial process should be left right now.  */
-  gdb_assert (num_lwps (pid) == 1);
-
-  main_lwp = find_lwp_pid (ptid_t (pid));
+  /* We have detached from everything except the main thread now, so
+     should only have one thread left.  However, in non-stop mode the
+     main thread might have exited, in which case we'll have no threads
+     left.  */
+  gdb_assert (num_lwps (pid) <= 1);
 
   if (forks_exist_p ())
     {
@@ -1443,10 +1444,14 @@ linux_nat_target::detach (inferior *inf, int from_tty)
     {
       target_announce_detach (from_tty);
 
-      /* Pass on any pending signal for the last LWP.  */
-      int signo = get_detach_signal (main_lwp);
+      main_lwp = find_lwp_pid (ptid_t (pid));
+      if (main_lwp != nullptr)
+	{
+	  /* Pass on any pending signal for the last LWP.  */
+	  int signo = get_detach_signal (main_lwp);
 
-      detach_one_lwp (main_lwp, &signo);
+	  detach_one_lwp (main_lwp, &signo);
+	}
 
       detach_success (inf);
     }
diff --git a/gdb/remote.c b/gdb/remote.c
index ff3d7e5cd32..f61035a26b0 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid)
   else if (rs->buf[0] == '\0')
     error (_("Remote doesn't know how to detach"));
   else
-    error (_("Can't detach process."));
+    {
+      /* It is possible that we have an unprocessed exit event for this
+	 pid.  If this is the case then we can ignore the failure to detach
+	 and just pretend that the detach worked, as far as the user is
+	 concerned, the process exited immediately after the detach.  */
+      bool process_has_already_exited = false;
+      struct remote_notif_state *rns = rs->notif_state;
+      auto reply
+	= (struct stop_reply *) rns->pending_event[notif_client_stop.id];
+      if (reply != nullptr && reply->ptid.pid () == pid)
+	{
+	  if (reply->ws.kind () == TARGET_WAITKIND_EXITED)
+	    process_has_already_exited = true;
+	}
+
+      if (!process_has_already_exited)
+	error (_("Can't detach process: %s"), (char *) rs->buf.data ());
+    }
 }
 
 /* This detaches a program to which we previously attached, using
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
new file mode 100644
index 00000000000..ecd0138c354
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
@@ -0,0 +1,59 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/* This is set from GDB to allow the main thread to exit.  */
+
+volatile int dont_exit_just_yet = 1;
+
+/* Somewhere to place a breakpoint.  */
+
+void
+breakpt ()
+{
+  /* Just spin.  When the test is started under GDB we never enter the spin
+     loop, but when we attach, the worker thread will be spinning here.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+}
+
+/* Thread function, doesn't do anything but hit a breakpoint.  */
+
+void *
+thread_worker (void *arg)
+{
+  breakpt ();
+  return NULL;
+}
+
+int
+main ()
+{
+  pthread_t thr;
+
+  /* Create a thread.  */
+  pthread_create (&thr, NULL, thread_worker, NULL);
+  pthread_detach (thr);
+
+  /* Spin until GDB releases us.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+
+  _exit (0);
+}
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
new file mode 100644
index 00000000000..97df3bd2783
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
@@ -0,0 +1,143 @@
+# Copyright 2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check for a race condition where in non-stop mode, the user might
+# have a thread other than the main (original) thread selected and use
+# the 'detach' command.
+#
+# As GDB tries to detach it is possible that the main thread might
+# exit, the main thread is still running due to non-stop mode.
+#
+# GDB used to assume that the main thread would always exist when
+# processing the detach, clearly this isn't the case, and this
+# assumption would lead to assertion failures and segfaults.
+#
+# Triggering the precise timing is pretty hard, we need the main
+# thread to exit after the user has entered the 'detach' command, but
+# before GDB enters the detach implementation and stops all threads,
+# the window of opportunity for this bug is actually tiny.
+#
+# However, we can trigger this bug 100% from Python, as GDB's
+# event-loop only kicks in once we return from a Python function.
+# Thus, if we have a single Python function that causes the main
+# thread to exit, and then calls detach GDB will not have a chance to
+# handle the main thread exiting before entering the detach code.
+
+standard_testfile
+
+require allow_python_tests
+
+if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
+    return -1
+}
+
+# Run the test.  When SPAWN_INFERIOR is true the inferior is started
+# as a separate process which GDB then attaches too.  When
+# SPAWN_INFERIOR is false the inferior is started directly within GDB.
+
+proc run_test { spawn_inferior } {
+    global srcfile
+    global gdb_prompt
+    global GDBFLAGS
+
+    save_vars { GDBFLAGS } {
+	append GDBFLAGS " -ex \"set non-stop on\""
+	clean_restart $::binfile
+    }
+
+    # Setup the inferior.  When complete the main thread (#1) will
+    # still be running (due to non-stop mode), while the worker thread
+    # (#2) will be stopped.
+    #
+    # There are two setup modes, when SPAWN_INFERIOR is true we span a
+    # separate process and attach to it, after the attach both threads
+    # are stopped, so it is necessary to resume thread #1.
+    #
+    # When SPAWN_INFERIOR is false we just start the inferior within
+    # GDB, in this case we place a breakpoint that will be hit by
+    # thread #2.  When the breakpoint is hit thread #1 will remain
+    # running.
+    if {$spawn_inferior} {
+	set test_spawn_id [spawn_wait_for_attach $::binfile]
+	set testpid [spawn_id_get_pid $test_spawn_id]
+
+	set escapedbinfile  [string_to_regexp $::binfile]
+	gdb_test -no-prompt-anchor "attach $testpid" \
+	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
+	    "attach to the inferior"
+
+	# Attaching to a multi-threaded application in non-stop mode
+	# can result in thread stops being reported after the prompt
+	# is displayed.
+	#
+	# Send a simple command now just to resync the command prompt.
+	gdb_test "p 1 + 2" " = 3"
+
+	# Set thread 1 (the current thread) running again.
+	gdb_test "continue&"
+    } else {
+	if {![runto_main]} {
+	    return -1
+	}
+
+	gdb_breakpoint "breakpt"
+	gdb_continue_to_breakpoint "run to breakpoint"
+    }
+
+    # Switch to thread 2.
+    gdb_test "thread 2" \
+	[multi_line \
+	     "Switching to thread 2\[^\r\n\]*" \
+	     "#0\\s+.*"]
+
+    # Create a Python function that sets a variable in the inferior and
+    # then detaches.  Setting the variable in the inferior will allow the
+    # main thread to exit, we even sleep for a short while in order to
+    # give the inferior a chance to exit.
+    #
+    # However, we don't want GDB to notice the exit before we call detach,
+    # which is why we perform both these actions from a Python function.
+    gdb_test_multiline "Create worker function" \
+	"python" "" \
+	"import time" "" \
+	"def set_and_detach():" "" \
+	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
+	"   time.sleep(1)" "" \
+	"   gdb.execute(\"detach\")" "" \
+	"end" ""
+
+    # The Python function performs two actions, the first causes the
+    # main thread to exit, while the second detaches from the inferior.
+    #
+    # In both cases the stop arrives while GDB is processing the
+    # detach, however, for remote targets GDB doesn't report the stop,
+    # while for local targets GDB does report the stop.
+    if {![gdb_is_target_remote]} {
+	set stop_re "\\\[Thread.*exited\\\]\r\n"
+    } else {
+	set stop_re ""
+    }
+    gdb_test "python set_and_detach()" \
+	"${stop_re}\\\[Inferior.*detached\\\]"
+}
+
+foreach_with_prefix spawn_inferior { true false } {
+    if {$spawn_inferior && ![can_spawn_for_attach]} {
+	# If spawning (and attaching too) a separate inferior is not
+	# supported for the current board, then lets not try too.
+	continue
+    }
+
+    run_test $spawn_inferior
+}
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2273c..12143caf104 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,13 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* GDB might not yet be aware that the current process has exited and so
+     still requests that we modify (delete) a breakpoint.  Just claim we
+     can't find the breakpoint.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] gdb: handle main thread exiting during detach
  2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
@ 2023-07-18 14:40   ` Andrew Burgess
  2023-08-03  9:50   ` [PATCHv2] " Andrew Burgess
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-07-18 14:40 UTC (permalink / raw)
  To: gdb-patches

Andrew Burgess <aburgess@redhat.com> writes:

> Overview
> ========
>
> In non-stop mode, the main thread is running while a second thread is
> stopped.  The user has the second thread selected as the current
> thread and asks GDB to detach.  At the exact moment of detach the main
> thread exits.  This situation causes crashes, assertion failures, and
> unexpected errors to be reported from GDB for both native and remote
> targets.
>
> This commit addresses this situation for native and remote targets,
> but the fixes are all different.
>
> Native Linux Target
> ===================
>
> This is caught in linux_nat_target::detach and the stop_wait_callback
> notices that the thread has exited and cleans up.
>
> GDB then detaches from everything except the main thread by calling
> detach_callback.
>
> After this the first problem is this assert:
>
>   /* Only the initial process should be left right now.  */
>   gdb_assert (num_lwps (pid) == 1);
>
> The num_lwps call will return 0 as the main thread has exited and all
> of the other threads have been detached.  I fix this by changing the
> assert too:
>
>   gdb_assert (num_lwps (pid) <= 1);
>
> and improving the comment.
>
> The next problem is that we do:
>
>   main_lwp = find_lwp_pid (ptid_t (pid));
>
> and then proceed assuming that main_lwp is not nullptr, however, as
> the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr,
> and so we crash trying to dereference the nullptr.  I fix this by
> adding a nullptr check for main_lwp.  I think this is fine because the
> actions we are skipping all relate to detaching from the main thread,
> but we'll not be doing this as the thread has already exited!
>
> For Remote Targets
> ==================
>
> On remote targets there are two problems.  The first is that when the
> exit occurs during the early phase of the detach, we see the stop
> notification arrive while GDB is removing the breakpoints ahead of the
> detach.  The 'set debug remote on' trace looks like this:
>
>   [remote] Sending packet: $z0,7f1648fe0241,1#35
>   [remote]   Notification received: Stop:W0;process:2a0ac8
>   # At this point an unpatched gdbserver segfaults, and the connection
>   # is broken.  A patched gdbserver continues as below...
>   [remote] Packet received: E01
>   [remote] Sending packet: $z0,7f1648ff00a8,1#68
>   [remote] Packet received: E01
>   [remote] Sending packet: $z0,7f1648ff132f,1#6b
>   [remote] Packet received: E01
>   [remote] Sending packet: $D;2a0ac8#3e
>   [remote] Packet received: E01
>
> I was originally running into Segmentation Faults, from within
> gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
> function calls current_process() and then dereferences the result to
> find the breakpoint list.
>
> However, in our case, the current process has already exited, and so
> the current_process() call returns nullptr.
>
> It seems reasonable that in this case gdbserver should report a
> failure to remove the breakpoint, and this is what I've done -- by
> adding a nullptr check within find_gdb_breakpoint.
>
> The second problem I ran into was on the gdb side, as the process has
> already exited, but GDB has not yet acknowledged the exit event, the
> detach -- the 'D' packet in the above trace -- fails.  This was being
> reported to the user with a 'Can't detach process' error.  As the test
> actually calls detach from Python code, this error was then becoming a
> Python exception.
>
> Though clearly the detach has returned an error, and so, maybe, having
> GDB throw an error would be fine, I think in this case, there's a good
> argument that the remote error should be ignored -- if GDB tries to
> detach and gets back an error, and if there's a pending exit event for
> the pid we tried to detach, then just ignore the error and pretend the
> detach worked fine.
>
> Testing
> =======
>
> In order to test this issue I needed to ensure that the exit event
> arrives at the same time as the detach call.  The window of
> opportunity for getting the exit to arrive is so small I've never
> managed to trigger this in real use.
>
> However, if we trigger both the exit and the detach from a single
> Python function then we never return to GDB's event loop, as such GDB
> never processes the exit event, and so the first time GDB gets a
> chance to see the exit is during the detach call.  And so that is the
> approach I've taken for testing this patch.
> ---
>  gdb/linux-nat.c                               |  19 ++-
>  gdb/remote.c                                  |  19 ++-
>  .../main-thread-exit-during-detach.c          |  59 ++++++++
>  .../main-thread-exit-during-detach.exp        | 143 ++++++++++++++++++
>  gdbserver/mem-break.cc                        |   7 +
>  5 files changed, 239 insertions(+), 8 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
>
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 73008a313c0..c3d62e0f85e 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1426,10 +1426,11 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>  
>    iterate_over_lwps (ptid_t (pid), detach_callback);
>  
> -  /* Only the initial process should be left right now.  */
> -  gdb_assert (num_lwps (pid) == 1);
> -
> -  main_lwp = find_lwp_pid (ptid_t (pid));
> +  /* We have detached from everything except the main thread now, so
> +     should only have one thread left.  However, in non-stop mode the
> +     main thread might have exited, in which case we'll have no threads
> +     left.  */
> +  gdb_assert (num_lwps (pid) <= 1);
>  
>    if (forks_exist_p ())
>      {
> @@ -1443,10 +1444,14 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>      {
>        target_announce_detach (from_tty);
>  
> -      /* Pass on any pending signal for the last LWP.  */
> -      int signo = get_detach_signal (main_lwp);
> +      main_lwp = find_lwp_pid (ptid_t (pid));
> +      if (main_lwp != nullptr)
> +	{
> +	  /* Pass on any pending signal for the last LWP.  */
> +	  int signo = get_detach_signal (main_lwp);
>  
> -      detach_one_lwp (main_lwp, &signo);
> +	  detach_one_lwp (main_lwp, &signo);
> +	}
>  
>        detach_success (inf);
>      }
> diff --git a/gdb/remote.c b/gdb/remote.c
> index ff3d7e5cd32..f61035a26b0 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid)
>    else if (rs->buf[0] == '\0')
>      error (_("Remote doesn't know how to detach"));
>    else
> -    error (_("Can't detach process."));
> +    {
> +      /* It is possible that we have an unprocessed exit event for this
> +	 pid.  If this is the case then we can ignore the failure to detach
> +	 and just pretend that the detach worked, as far as the user is
> +	 concerned, the process exited immediately after the detach.  */
> +      bool process_has_already_exited = false;
> +      struct remote_notif_state *rns = rs->notif_state;
> +      auto reply
> +	= (struct stop_reply *) rns->pending_event[notif_client_stop.id];
> +      if (reply != nullptr && reply->ptid.pid () == pid)
> +	{
> +	  if (reply->ws.kind () == TARGET_WAITKIND_EXITED)
> +	    process_has_already_exited = true;
> +	}
> +
> +      if (!process_has_already_exited)
> +	error (_("Can't detach process: %s"), (char *) rs->buf.data ());
> +    }
>  }
>  
>  /* This detaches a program to which we previously attached, using
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> new file mode 100644
> index 00000000000..ecd0138c354
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> @@ -0,0 +1,59 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2023 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +/* This is set from GDB to allow the main thread to exit.  */
> +
> +volatile int dont_exit_just_yet = 1;
> +
> +/* Somewhere to place a breakpoint.  */
> +
> +void
> +breakpt ()
> +{
> +  /* Just spin.  When the test is started under GDB we never enter the spin
> +     loop, but when we attach, the worker thread will be spinning here.  */
> +  while (dont_exit_just_yet)
> +    sleep (1);
> +}
> +
> +/* Thread function, doesn't do anything but hit a breakpoint.  */
> +
> +void *
> +thread_worker (void *arg)
> +{
> +  breakpt ();
> +  return NULL;
> +}
> +
> +int
> +main ()
> +{
> +  pthread_t thr;
> +

This test program is missing an 'alarm (300);' call, which I've now
added just here.  Apologies for missing this before posting.

Thanks,
Andrew

> +  /* Create a thread.  */
> +  pthread_create (&thr, NULL, thread_worker, NULL);
> +  pthread_detach (thr);
> +
> +  /* Spin until GDB releases us.  */
> +  while (dont_exit_just_yet)
> +    sleep (1);
> +
> +  _exit (0);
> +}
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> new file mode 100644
> index 00000000000..97df3bd2783
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> @@ -0,0 +1,143 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Check for a race condition where in non-stop mode, the user might
> +# have a thread other than the main (original) thread selected and use
> +# the 'detach' command.
> +#
> +# As GDB tries to detach it is possible that the main thread might
> +# exit, the main thread is still running due to non-stop mode.
> +#
> +# GDB used to assume that the main thread would always exist when
> +# processing the detach, clearly this isn't the case, and this
> +# assumption would lead to assertion failures and segfaults.
> +#
> +# Triggering the precise timing is pretty hard, we need the main
> +# thread to exit after the user has entered the 'detach' command, but
> +# before GDB enters the detach implementation and stops all threads,
> +# the window of opportunity for this bug is actually tiny.
> +#
> +# However, we can trigger this bug 100% from Python, as GDB's
> +# event-loop only kicks in once we return from a Python function.
> +# Thus, if we have a single Python function that causes the main
> +# thread to exit, and then calls detach GDB will not have a chance to
> +# handle the main thread exiting before entering the detach code.
> +
> +standard_testfile
> +
> +require allow_python_tests
> +
> +if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
> +    return -1
> +}
> +
> +# Run the test.  When SPAWN_INFERIOR is true the inferior is started
> +# as a separate process which GDB then attaches too.  When
> +# SPAWN_INFERIOR is false the inferior is started directly within GDB.
> +
> +proc run_test { spawn_inferior } {
> +    global srcfile
> +    global gdb_prompt
> +    global GDBFLAGS
> +
> +    save_vars { GDBFLAGS } {
> +	append GDBFLAGS " -ex \"set non-stop on\""
> +	clean_restart $::binfile
> +    }
> +
> +    # Setup the inferior.  When complete the main thread (#1) will
> +    # still be running (due to non-stop mode), while the worker thread
> +    # (#2) will be stopped.
> +    #
> +    # There are two setup modes, when SPAWN_INFERIOR is true we span a
> +    # separate process and attach to it, after the attach both threads
> +    # are stopped, so it is necessary to resume thread #1.
> +    #
> +    # When SPAWN_INFERIOR is false we just start the inferior within
> +    # GDB, in this case we place a breakpoint that will be hit by
> +    # thread #2.  When the breakpoint is hit thread #1 will remain
> +    # running.
> +    if {$spawn_inferior} {
> +	set test_spawn_id [spawn_wait_for_attach $::binfile]
> +	set testpid [spawn_id_get_pid $test_spawn_id]
> +
> +	set escapedbinfile  [string_to_regexp $::binfile]
> +	gdb_test -no-prompt-anchor "attach $testpid" \
> +	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
> +	    "attach to the inferior"
> +
> +	# Attaching to a multi-threaded application in non-stop mode
> +	# can result in thread stops being reported after the prompt
> +	# is displayed.
> +	#
> +	# Send a simple command now just to resync the command prompt.
> +	gdb_test "p 1 + 2" " = 3"
> +
> +	# Set thread 1 (the current thread) running again.
> +	gdb_test "continue&"
> +    } else {
> +	if {![runto_main]} {
> +	    return -1
> +	}
> +
> +	gdb_breakpoint "breakpt"
> +	gdb_continue_to_breakpoint "run to breakpoint"
> +    }
> +
> +    # Switch to thread 2.
> +    gdb_test "thread 2" \
> +	[multi_line \
> +	     "Switching to thread 2\[^\r\n\]*" \
> +	     "#0\\s+.*"]
> +
> +    # Create a Python function that sets a variable in the inferior and
> +    # then detaches.  Setting the variable in the inferior will allow the
> +    # main thread to exit, we even sleep for a short while in order to
> +    # give the inferior a chance to exit.
> +    #
> +    # However, we don't want GDB to notice the exit before we call detach,
> +    # which is why we perform both these actions from a Python function.
> +    gdb_test_multiline "Create worker function" \
> +	"python" "" \
> +	"import time" "" \
> +	"def set_and_detach():" "" \
> +	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
> +	"   time.sleep(1)" "" \
> +	"   gdb.execute(\"detach\")" "" \
> +	"end" ""
> +
> +    # The Python function performs two actions, the first causes the
> +    # main thread to exit, while the second detaches from the inferior.
> +    #
> +    # In both cases the stop arrives while GDB is processing the
> +    # detach, however, for remote targets GDB doesn't report the stop,
> +    # while for local targets GDB does report the stop.
> +    if {![gdb_is_target_remote]} {
> +	set stop_re "\\\[Thread.*exited\\\]\r\n"
> +    } else {
> +	set stop_re ""
> +    }
> +    gdb_test "python set_and_detach()" \
> +	"${stop_re}\\\[Inferior.*detached\\\]"
> +}
> +
> +foreach_with_prefix spawn_inferior { true false } {
> +    if {$spawn_inferior && ![can_spawn_for_attach]} {
> +	# If spawning (and attaching too) a separate inferior is not
> +	# supported for the current board, then lets not try too.
> +	continue
> +    }
> +
> +    run_test $spawn_inferior
> +}
> diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
> index 897b9a2273c..12143caf104 100644
> --- a/gdbserver/mem-break.cc
> +++ b/gdbserver/mem-break.cc
> @@ -976,6 +976,13 @@ static struct gdb_breakpoint *
>  find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
>  {
>    struct process_info *proc = current_process ();
> +
> +  /* GDB might not yet be aware that the current process has exited and so
> +     still requests that we modify (delete) a breakpoint.  Just claim we
> +     can't find the breakpoint.  */
> +  if (proc == nullptr)
> +    return nullptr;
> +
>    struct breakpoint *bp;
>    enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
>  
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call
  2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
@ 2023-07-19 15:01   ` Tom Tromey
  2023-08-03  9:07     ` Andrew Burgess
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Tromey @ 2023-07-19 15:01 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches; +Cc: Andrew Burgess

>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:

Andrew> While working on the next patch I triggered a segfault from within the
Andrew> function remote_target::discard_pending_stop_replies.  Turns out this
Andrew> was caused by a cut&paste error introduced in this commit:

Andrew>   commit df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78
Andrew>   Date:   Wed Dec 1 09:40:03 2021 -0500

Andrew>       gdb, gdbserver: detach fork child when detaching from fork parent

Andrew> This commit adds a remote_debug_printf call that was copied from
Andrew> earlier in the function, however, the new call wasn't updated to use
Andrew> the appropriate local variable.  The local variable that it is using
Andrew> might be nullptr, in which case we trigger undefined behaviour, and
Andrew> could crash, which is what I was seeing.

Andrew> Fixed by updating to use the correct local variable.

Seems obvious to me FWIW.

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call
  2023-07-19 15:01   ` Tom Tromey
@ 2023-08-03  9:07     ` Andrew Burgess
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-08-03  9:07 UTC (permalink / raw)
  To: Tom Tromey, Andrew Burgess via Gdb-patches

Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:
>
> Andrew> While working on the next patch I triggered a segfault from within the
> Andrew> function remote_target::discard_pending_stop_replies.  Turns out this
> Andrew> was caused by a cut&paste error introduced in this commit:
>
> Andrew>   commit df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78
> Andrew>   Date:   Wed Dec 1 09:40:03 2021 -0500
>
> Andrew>       gdb, gdbserver: detach fork child when detaching from fork parent
>
> Andrew> This commit adds a remote_debug_printf call that was copied from
> Andrew> earlier in the function, however, the new call wasn't updated to use
> Andrew> the appropriate local variable.  The local variable that it is using
> Andrew> might be nullptr, in which case we trigger undefined behaviour, and
> Andrew> could crash, which is what I was seeing.
>
> Andrew> Fixed by updating to use the correct local variable.
>
> Seems obvious to me FWIW.

I updated the commit message to not mention "the next patch", and pushed
this fix.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2] gdb: handle main thread exiting during detach
  2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
  2023-07-18 14:40   ` Andrew Burgess
@ 2023-08-03  9:50   ` Andrew Burgess
  2023-08-23 11:26     ` [PATCHv3] " Andrew Burgess
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-08-03  9:50 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes since v1:

  - Rebased to current upstream/master,

  - Extended a gdb_assert call, and added an additional gdb_assert
    call in linux-nat.c,

  - Added an extra comment in linux-nat.c,

  - Retested.

---

Overview
========

In non-stop mode, the main thread is running while a second thread is
stopped.  The user has the second thread selected as the current
thread and asks GDB to detach.  At the exact moment of detach the main
thread exits.  This situation causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets,
but the fixes are all different.

Native Linux Target
===================

This is caught in linux_nat_target::detach and the stop_wait_callback
notices that the thread has exited and cleans up.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have been detached.  I fix this by changing the
assert too:

  gdb_assert (num_lwps (pid) <= 1);

and improving the comment.

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr, however, as
the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr,
and so we crash trying to dereference the nullptr.  I fix this by
adding a nullptr check for main_lwp.  I think this is fine because the
actions we are skipping all relate to detaching from the main thread,
but we'll not be doing this as the thread has already exited!

For Remote Targets
==================

On remote targets there are two problems.  The first is that when the
exit occurs during the early phase of the detach, we see the stop
notification arrive while GDB is removing the breakpoints ahead of the
detach.  The 'set debug remote on' trace looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.

It seems reasonable that in this case gdbserver should report a
failure to remove the breakpoint, and this is what I've done -- by
adding a nullptr check within find_gdb_breakpoint.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error should be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.
---
 gdb/linux-nat.c                               |  24 ++-
 gdb/remote.c                                  |  19 ++-
 .../main-thread-exit-during-detach.c          |  61 ++++++++
 .../main-thread-exit-during-detach.exp        | 143 ++++++++++++++++++
 gdbserver/mem-break.cc                        |   7 +
 5 files changed, 246 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 250a8f43282..a25ce006a02 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1426,10 +1426,12 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
-  /* Only the initial process should be left right now.  */
-  gdb_assert (num_lwps (pid) == 1);
-
-  main_lwp = find_lwp_pid (ptid_t (pid));
+  /* We have detached from everything except the main thread now, so
+     should only have one thread left.  However, in non-stop mode the
+     main thread might have exited, in which case we'll have no threads
+     left.  */
+  gdb_assert (num_lwps (pid) == 1
+	      || (target_is_non_stop_p () && num_lwps (pid) == 0));
 
   if (forks_exist_p ())
     {
@@ -1443,10 +1445,18 @@ linux_nat_target::detach (inferior *inf, int from_tty)
     {
       target_announce_detach (from_tty);
 
-      /* Pass on any pending signal for the last LWP.  */
-      int signo = get_detach_signal (main_lwp);
+      /* In non-stop mode it is possible that the main thread has exited,
+	 in which case we don't try to detach.  */
+      main_lwp = find_lwp_pid (ptid_t (pid));
+      if (main_lwp != nullptr)
+	{
+	  /* Pass on any pending signal for the last LWP.  */
+	  int signo = get_detach_signal (main_lwp);
 
-      detach_one_lwp (main_lwp, &signo);
+	  detach_one_lwp (main_lwp, &signo);
+	}
+      else
+	gdb_assert (target_is_non_stop_p ());
 
       detach_success (inf);
     }
diff --git a/gdb/remote.c b/gdb/remote.c
index ff3d7e5cd32..f61035a26b0 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid)
   else if (rs->buf[0] == '\0')
     error (_("Remote doesn't know how to detach"));
   else
-    error (_("Can't detach process."));
+    {
+      /* It is possible that we have an unprocessed exit event for this
+	 pid.  If this is the case then we can ignore the failure to detach
+	 and just pretend that the detach worked, as far as the user is
+	 concerned, the process exited immediately after the detach.  */
+      bool process_has_already_exited = false;
+      struct remote_notif_state *rns = rs->notif_state;
+      auto reply
+	= (struct stop_reply *) rns->pending_event[notif_client_stop.id];
+      if (reply != nullptr && reply->ptid.pid () == pid)
+	{
+	  if (reply->ws.kind () == TARGET_WAITKIND_EXITED)
+	    process_has_already_exited = true;
+	}
+
+      if (!process_has_already_exited)
+	error (_("Can't detach process: %s"), (char *) rs->buf.data ());
+    }
 }
 
 /* This detaches a program to which we previously attached, using
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
new file mode 100644
index 00000000000..5335842cfe8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/* This is set from GDB to allow the main thread to exit.  */
+
+volatile int dont_exit_just_yet = 1;
+
+/* Somewhere to place a breakpoint.  */
+
+void
+breakpt ()
+{
+  /* Just spin.  When the test is started under GDB we never enter the spin
+     loop, but when we attach, the worker thread will be spinning here.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+}
+
+/* Thread function, doesn't do anything but hit a breakpoint.  */
+
+void *
+thread_worker (void *arg)
+{
+  breakpt ();
+  return NULL;
+}
+
+int
+main ()
+{
+  pthread_t thr;
+
+  alarm (300);
+
+  /* Create a thread.  */
+  pthread_create (&thr, NULL, thread_worker, NULL);
+  pthread_detach (thr);
+
+  /* Spin until GDB releases us.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+
+  _exit (0);
+}
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
new file mode 100644
index 00000000000..97df3bd2783
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
@@ -0,0 +1,143 @@
+# Copyright 2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check for a race condition where in non-stop mode, the user might
+# have a thread other than the main (original) thread selected and use
+# the 'detach' command.
+#
+# As GDB tries to detach it is possible that the main thread might
+# exit, the main thread is still running due to non-stop mode.
+#
+# GDB used to assume that the main thread would always exist when
+# processing the detach, clearly this isn't the case, and this
+# assumption would lead to assertion failures and segfaults.
+#
+# Triggering the precise timing is pretty hard, we need the main
+# thread to exit after the user has entered the 'detach' command, but
+# before GDB enters the detach implementation and stops all threads,
+# the window of opportunity for this bug is actually tiny.
+#
+# However, we can trigger this bug 100% from Python, as GDB's
+# event-loop only kicks in once we return from a Python function.
+# Thus, if we have a single Python function that causes the main
+# thread to exit, and then calls detach GDB will not have a chance to
+# handle the main thread exiting before entering the detach code.
+
+standard_testfile
+
+require allow_python_tests
+
+if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
+    return -1
+}
+
+# Run the test.  When SPAWN_INFERIOR is true the inferior is started
+# as a separate process which GDB then attaches too.  When
+# SPAWN_INFERIOR is false the inferior is started directly within GDB.
+
+proc run_test { spawn_inferior } {
+    global srcfile
+    global gdb_prompt
+    global GDBFLAGS
+
+    save_vars { GDBFLAGS } {
+	append GDBFLAGS " -ex \"set non-stop on\""
+	clean_restart $::binfile
+    }
+
+    # Setup the inferior.  When complete the main thread (#1) will
+    # still be running (due to non-stop mode), while the worker thread
+    # (#2) will be stopped.
+    #
+    # There are two setup modes, when SPAWN_INFERIOR is true we span a
+    # separate process and attach to it, after the attach both threads
+    # are stopped, so it is necessary to resume thread #1.
+    #
+    # When SPAWN_INFERIOR is false we just start the inferior within
+    # GDB, in this case we place a breakpoint that will be hit by
+    # thread #2.  When the breakpoint is hit thread #1 will remain
+    # running.
+    if {$spawn_inferior} {
+	set test_spawn_id [spawn_wait_for_attach $::binfile]
+	set testpid [spawn_id_get_pid $test_spawn_id]
+
+	set escapedbinfile  [string_to_regexp $::binfile]
+	gdb_test -no-prompt-anchor "attach $testpid" \
+	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
+	    "attach to the inferior"
+
+	# Attaching to a multi-threaded application in non-stop mode
+	# can result in thread stops being reported after the prompt
+	# is displayed.
+	#
+	# Send a simple command now just to resync the command prompt.
+	gdb_test "p 1 + 2" " = 3"
+
+	# Set thread 1 (the current thread) running again.
+	gdb_test "continue&"
+    } else {
+	if {![runto_main]} {
+	    return -1
+	}
+
+	gdb_breakpoint "breakpt"
+	gdb_continue_to_breakpoint "run to breakpoint"
+    }
+
+    # Switch to thread 2.
+    gdb_test "thread 2" \
+	[multi_line \
+	     "Switching to thread 2\[^\r\n\]*" \
+	     "#0\\s+.*"]
+
+    # Create a Python function that sets a variable in the inferior and
+    # then detaches.  Setting the variable in the inferior will allow the
+    # main thread to exit, we even sleep for a short while in order to
+    # give the inferior a chance to exit.
+    #
+    # However, we don't want GDB to notice the exit before we call detach,
+    # which is why we perform both these actions from a Python function.
+    gdb_test_multiline "Create worker function" \
+	"python" "" \
+	"import time" "" \
+	"def set_and_detach():" "" \
+	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
+	"   time.sleep(1)" "" \
+	"   gdb.execute(\"detach\")" "" \
+	"end" ""
+
+    # The Python function performs two actions, the first causes the
+    # main thread to exit, while the second detaches from the inferior.
+    #
+    # In both cases the stop arrives while GDB is processing the
+    # detach, however, for remote targets GDB doesn't report the stop,
+    # while for local targets GDB does report the stop.
+    if {![gdb_is_target_remote]} {
+	set stop_re "\\\[Thread.*exited\\\]\r\n"
+    } else {
+	set stop_re ""
+    }
+    gdb_test "python set_and_detach()" \
+	"${stop_re}\\\[Inferior.*detached\\\]"
+}
+
+foreach_with_prefix spawn_inferior { true false } {
+    if {$spawn_inferior && ![can_spawn_for_attach]} {
+	# If spawning (and attaching too) a separate inferior is not
+	# supported for the current board, then lets not try too.
+	continue
+    }
+
+    run_test $spawn_inferior
+}
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2273c..12143caf104 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,13 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* GDB might not yet be aware that the current process has exited and so
+     still requests that we modify (delete) a breakpoint.  Just claim we
+     can't find the breakpoint.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
 

base-commit: 1720b64f735ff2798ab50ea9e2a40ab42af6cc6e
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3] gdb: handle main thread exiting during detach
  2023-08-03  9:50   ` [PATCHv2] " Andrew Burgess
@ 2023-08-23 11:26     ` Andrew Burgess
  2023-09-29 13:57       ` [PATCHv4] " Andrew Burgess
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-08-23 11:26 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes since v2:

  - In remote_target::remote_detach_pid, I now use:

      remote_notif_get_pending_events (&notif_client_stop);
      for (stop_reply_up &reply : rs->stop_reply_queue)
        ...

    To look for pending stop events, rather than accessing the pending
    event queue directly.  This ensures that GDB sends the ACK of any
    pending stop events first before making use of them, and also is
    more inline with how we peek at events in other places in remote.c.

  - Read through the commit message and reworded some parts of it to
    make things clearer.

Changes since v1:

  - Rebased to current upstream/master,

  - Extended a gdb_assert call, and added an additional gdb_assert
    call in linux-nat.c,

  - Added an extra comment in linux-nat.c,

  - Retested.

---

Overview
========

Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped.  The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.

This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets,
the fixes are small, but different.

Native Linux Target
===================

For the native Linux target, detaching is handled in the function
linux_nat_target::detach.  In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached.  I fix this by changing
the assert to allow for 0 or 1 lwps at this point.  As the 0 case can
only happen in non-stop mode, the assert becomes:

  gdb_assert (num_lwps (pid) == 1
	      || (target_is_non_stop_p () && num_lwps (pid) == 0));

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr.  In the case
that the main thread has exited though, main_lwp will be nullptr.

However, we only need main_lwp so that GDB can detach from the
thread.  If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.

For Remote Targets
==================

On remote targets there are two problems.

First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach.  The 'set debug remote on' trace
looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.  At the point of failure,
the gdbserver backtrace looks like this:

  #0  0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
  #1  0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
  #2  0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
  #3  0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
  ...

The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent.  Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach.  In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.
---
 gdb/linux-nat.c                               |  24 ++-
 gdb/remote.c                                  |  27 +++-
 .../main-thread-exit-during-detach.c          |  61 ++++++++
 .../main-thread-exit-during-detach.exp        | 139 ++++++++++++++++++
 gdbserver/mem-break.cc                        |   7 +
 5 files changed, 250 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index e58d67183b5..b16bc8cc983 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1420,10 +1420,12 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
-  /* Only the initial process should be left right now.  */
-  gdb_assert (num_lwps (pid) == 1);
-
-  main_lwp = find_lwp_pid (ptid_t (pid));
+  /* We have detached from everything except the main thread now, so
+     should only have one thread left.  However, in non-stop mode the
+     main thread might have exited, in which case we'll have no threads
+     left.  */
+  gdb_assert (num_lwps (pid) == 1
+	      || (target_is_non_stop_p () && num_lwps (pid) == 0));
 
   if (forks_exist_p ())
     {
@@ -1437,10 +1439,18 @@ linux_nat_target::detach (inferior *inf, int from_tty)
     {
       target_announce_detach (from_tty);
 
-      /* Pass on any pending signal for the last LWP.  */
-      int signo = get_detach_signal (main_lwp);
+      /* In non-stop mode it is possible that the main thread has exited,
+	 in which case we don't try to detach.  */
+      main_lwp = find_lwp_pid (ptid_t (pid));
+      if (main_lwp != nullptr)
+	{
+	  /* Pass on any pending signal for the last LWP.  */
+	  int signo = get_detach_signal (main_lwp);
 
-      detach_one_lwp (main_lwp, &signo);
+	  detach_one_lwp (main_lwp, &signo);
+	}
+      else
+	gdb_assert (target_is_non_stop_p ());
 
       detach_success (inf);
     }
diff --git a/gdb/remote.c b/gdb/remote.c
index 7abe08439b5..ef0d0abb0a6 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6126,7 +6126,32 @@ remote_target::remote_detach_pid (int pid)
   else if (rs->buf[0] == '\0')
     error (_("Remote doesn't know how to detach"));
   else
-    error (_("Can't detach process."));
+    {
+      /* It is possible that we have an unprocessed exit event for this
+	 pid.  If this is the case then we can ignore the failure to detach
+	 and just pretend that the detach worked, as far as the user is
+	 concerned, the process exited immediately after the detach.  */
+      bool process_has_already_exited = false;
+      remote_notif_get_pending_events (&notif_client_stop);
+      for (stop_reply_up &reply : rs->stop_reply_queue)
+	{
+	  if (reply->ptid.pid () != pid)
+	    continue;
+
+	  enum target_waitkind kind = reply->ws.kind ();
+	  if (kind == TARGET_WAITKIND_EXITED
+	      || kind == TARGET_WAITKIND_SIGNALLED)
+	    {
+	      process_has_already_exited = true;
+	      remote_debug_printf ("detach failed, but process "
+				   "already exited");
+	      break;
+	    }
+	}
+
+      if (!process_has_already_exited)
+	error (_("Can't detach process: %s"), (char *) rs->buf.data ());
+    }
 }
 
 /* This detaches a program to which we previously attached, using
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
new file mode 100644
index 00000000000..5335842cfe8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/* This is set from GDB to allow the main thread to exit.  */
+
+volatile int dont_exit_just_yet = 1;
+
+/* Somewhere to place a breakpoint.  */
+
+void
+breakpt ()
+{
+  /* Just spin.  When the test is started under GDB we never enter the spin
+     loop, but when we attach, the worker thread will be spinning here.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+}
+
+/* Thread function, doesn't do anything but hit a breakpoint.  */
+
+void *
+thread_worker (void *arg)
+{
+  breakpt ();
+  return NULL;
+}
+
+int
+main ()
+{
+  pthread_t thr;
+
+  alarm (300);
+
+  /* Create a thread.  */
+  pthread_create (&thr, NULL, thread_worker, NULL);
+  pthread_detach (thr);
+
+  /* Spin until GDB releases us.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+
+  _exit (0);
+}
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
new file mode 100644
index 00000000000..7d57a53827a
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
@@ -0,0 +1,139 @@
+# Copyright 2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check for a race condition where in non-stop mode, the user might
+# have a thread other than the main (original) thread selected and use
+# the 'detach' command.
+#
+# As GDB tries to detach it is possible that the main thread might
+# exit, the main thread is still running due to non-stop mode.
+#
+# GDB used to assume that the main thread would always exist when
+# processing the detach, clearly this isn't the case, and this
+# assumption would lead to assertion failures and segfaults.
+#
+# Triggering the precise timing is pretty hard, we need the main
+# thread to exit after the user has entered the 'detach' command, but
+# before GDB enters the detach implementation and stops all threads,
+# the window of opportunity for this bug is actually tiny.
+#
+# However, we can trigger this bug 100% from Python, as GDB's
+# event-loop only kicks in once we return from a Python function.
+# Thus, if we have a single Python function that causes the main
+# thread to exit, and then calls detach GDB will not have a chance to
+# handle the main thread exiting before entering the detach code.
+
+standard_testfile
+
+require allow_python_tests
+
+if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
+    return -1
+}
+
+# Run the test.  When SPAWN_INFERIOR is true the inferior is started
+# as a separate process which GDB then attaches too.  When
+# SPAWN_INFERIOR is false the inferior is started directly within GDB.
+
+proc run_test { spawn_inferior } {
+    save_vars { ::GDBFLAGS } {
+	append ::GDBFLAGS " -ex \"set non-stop on\""
+	clean_restart $::binfile
+    }
+
+    # Setup the inferior.  When complete the main thread (#1) will
+    # still be running (due to non-stop mode), while the worker thread
+    # (#2) will be stopped.
+    #
+    # There are two setup modes, when SPAWN_INFERIOR is true we span a
+    # separate process and attach to it, after the attach both threads
+    # are stopped, so it is necessary to resume thread #1.
+    #
+    # When SPAWN_INFERIOR is false we just start the inferior within
+    # GDB, in this case we place a breakpoint that will be hit by
+    # thread #2.  When the breakpoint is hit thread #1 will remain
+    # running.
+    if {$spawn_inferior} {
+	set test_spawn_id [spawn_wait_for_attach $::binfile]
+	set testpid [spawn_id_get_pid $test_spawn_id]
+
+	set escapedbinfile  [string_to_regexp $::binfile]
+	gdb_test -no-prompt-anchor "attach $testpid" \
+	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
+	    "attach to the inferior"
+
+	# Attaching to a multi-threaded application in non-stop mode
+	# can result in thread stops being reported after the prompt
+	# is displayed.
+	#
+	# Send a simple command now just to resync the command prompt.
+	gdb_test "p 1 + 2" " = 3"
+
+	# Set thread 1 (the current thread) running again.
+	gdb_test "continue&"
+    } else {
+	if {![runto_main]} {
+	    return -1
+	}
+
+	gdb_breakpoint "breakpt"
+	gdb_continue_to_breakpoint "run to breakpoint"
+    }
+
+    # Switch to thread 2.
+    gdb_test "thread 2" \
+	[multi_line \
+	     "Switching to thread 2\[^\r\n\]*" \
+	     "#0\\s+.*"]
+
+    # Create a Python function that sets a variable in the inferior and
+    # then detaches.  Setting the variable in the inferior will allow the
+    # main thread to exit, we even sleep for a short while in order to
+    # give the inferior a chance to exit.
+    #
+    # However, we don't want GDB to notice the exit before we call detach,
+    # which is why we perform both these actions from a Python function.
+    gdb_test_multiline "Create worker function" \
+	"python" "" \
+	"import time" "" \
+	"def set_and_detach():" "" \
+	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
+	"   time.sleep(1)" "" \
+	"   gdb.execute(\"detach\")" "" \
+	"end" ""
+
+    # The Python function performs two actions, the first causes the
+    # main thread to exit, while the second detaches from the inferior.
+    #
+    # In both cases the stop arrives while GDB is processing the
+    # detach, however, for remote targets GDB doesn't report the stop,
+    # while for local targets GDB does report the stop.
+    if {![gdb_is_target_remote]} {
+	set stop_re "\\\[Thread.*exited\\\]\r\n"
+    } else {
+	set stop_re ""
+    }
+    gdb_test "python set_and_detach()" \
+	"${stop_re}\\\[Inferior.*detached\\\]"
+}
+
+foreach_with_prefix spawn_inferior { true false } {
+    if {$spawn_inferior && ![can_spawn_for_attach]} {
+	# If spawning (and attaching too) a separate inferior is not
+	# supported for the current board, then skip this test.
+	continue
+    }
+
+    run_test $spawn_inferior
+}
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2273c..12143caf104 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,13 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* GDB might not yet be aware that the current process has exited and so
+     still requests that we modify (delete) a breakpoint.  Just claim we
+     can't find the breakpoint.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
 

base-commit: 3ce8f906be7a55d8c0375e6d360cc53b456d86ae
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv4] gdb: handle main thread exiting during detach
  2023-08-23 11:26     ` [PATCHv3] " Andrew Burgess
@ 2023-09-29 13:57       ` Andrew Burgess
  2023-10-18  9:49         ` [PATCHv5] " Andrew Burgess
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-09-29 13:57 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes since v3:

  - Rebased to current master,

  - Reworded a comment in gdbserver/mem-break.cc, reformatted the code
    in gdb/remote.c, but neither are functional changes.

Changes since v2:

  - In remote_target::remote_detach_pid, I now use:

      remote_notif_get_pending_events (&notif_client_stop);
      for (stop_reply_up &reply : rs->stop_reply_queue)
        ...

    To look for pending stop events, rather than accessing the pending
    event queue directly.  This ensures that GDB sends the ACK of any
    pending stop events first before making use of them, and also is
    more inline with how we peek at events in other places in remote.c.

  - Read through the commit message and reworded some parts of it to
    make things clearer.

Changes since v1:

  - Rebased to current upstream/master,

  - Extended a gdb_assert call, and added an additional gdb_assert
    call in linux-nat.c,

  - Added an extra comment in linux-nat.c,

  - Retested.

---

Overview
========

Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped.  The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.

This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.

Native Linux Target
===================

For the native Linux target, detaching is handled in the function
linux_nat_target::detach.  In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached.  I fix this by changing
the assert to allow for 0 or 1 lwps at this point.  As the 0 case can
only happen in non-stop mode, the assert becomes:

  gdb_assert (num_lwps (pid) == 1
	      || (target_is_non_stop_p () && num_lwps (pid) == 0));

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr.  In the case
that the main thread has exited though, main_lwp will be nullptr.

However, we only need main_lwp so that GDB can detach from the
thread.  If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.

For Remote Targets
==================

On remote targets there are two problems.

First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach.  The 'set debug remote on' trace
looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.  At the point of failure,
the gdbserver backtrace looks like this:

  #0  0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
  #1  0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
  #2  0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
  #3  0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
  ...

The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent.  Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach.  In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.
---
 gdb/linux-nat.c                               |  24 ++-
 gdb/remote.c                                  |  27 +++-
 .../main-thread-exit-during-detach.c          |  61 ++++++++
 .../main-thread-exit-during-detach.exp        | 139 ++++++++++++++++++
 gdbserver/mem-break.cc                        |  11 ++
 5 files changed, 254 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 7e36ced6292..1e79b9ba53f 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1420,10 +1420,12 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
-  /* Only the initial process should be left right now.  */
-  gdb_assert (num_lwps (pid) == 1);
-
-  main_lwp = find_lwp_pid (ptid_t (pid));
+  /* We have detached from everything except the main thread now, so
+     should only have one thread left.  However, in non-stop mode the
+     main thread might have exited, in which case we'll have no threads
+     left.  */
+  gdb_assert (num_lwps (pid) == 1
+	      || (target_is_non_stop_p () && num_lwps (pid) == 0));
 
   if (forks_exist_p ())
     {
@@ -1437,10 +1439,18 @@ linux_nat_target::detach (inferior *inf, int from_tty)
     {
       target_announce_detach (from_tty);
 
-      /* Pass on any pending signal for the last LWP.  */
-      int signo = get_detach_signal (main_lwp);
+      /* In non-stop mode it is possible that the main thread has exited,
+	 in which case we don't try to detach.  */
+      main_lwp = find_lwp_pid (ptid_t (pid));
+      if (main_lwp != nullptr)
+	{
+	  /* Pass on any pending signal for the last LWP.  */
+	  int signo = get_detach_signal (main_lwp);
 
-      detach_one_lwp (main_lwp, &signo);
+	  detach_one_lwp (main_lwp, &signo);
+	}
+      else
+	gdb_assert (target_is_non_stop_p ());
 
       detach_success (inf);
     }
diff --git a/gdb/remote.c b/gdb/remote.c
index 9bb4f1de598..40b988f7b6d 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6122,7 +6122,32 @@ remote_target::remote_detach_pid (int pid)
   else if (rs->buf[0] == '\0')
     error (_("Remote doesn't know how to detach"));
   else
-    error (_("Can't detach process."));
+    {
+      /* It is possible that we have an unprocessed exit event for this
+	 pid.  If this is the case then we can ignore the failure to detach
+	 and just pretend that the detach worked, as far as the user is
+	 concerned, the process exited immediately after the detach.  */
+      bool process_has_already_exited = false;
+      remote_notif_get_pending_events (&notif_client_stop);
+      for (stop_reply_up &reply : rs->stop_reply_queue)
+	{
+	  if (reply->ptid.pid () != pid)
+	    continue;
+
+	  enum target_waitkind kind = reply->ws.kind ();
+	  if (kind == TARGET_WAITKIND_EXITED
+	      || kind == TARGET_WAITKIND_SIGNALLED)
+	    {
+	      process_has_already_exited = true;
+	      remote_debug_printf
+		("detach failed, but process already exited");
+	      break;
+	    }
+	}
+
+      if (!process_has_already_exited)
+	error (_("can't detach process: %s"), (char *) rs->buf.data ());
+    }
 }
 
 /* This detaches a program to which we previously attached, using
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
new file mode 100644
index 00000000000..5335842cfe8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/* This is set from GDB to allow the main thread to exit.  */
+
+volatile int dont_exit_just_yet = 1;
+
+/* Somewhere to place a breakpoint.  */
+
+void
+breakpt ()
+{
+  /* Just spin.  When the test is started under GDB we never enter the spin
+     loop, but when we attach, the worker thread will be spinning here.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+}
+
+/* Thread function, doesn't do anything but hit a breakpoint.  */
+
+void *
+thread_worker (void *arg)
+{
+  breakpt ();
+  return NULL;
+}
+
+int
+main ()
+{
+  pthread_t thr;
+
+  alarm (300);
+
+  /* Create a thread.  */
+  pthread_create (&thr, NULL, thread_worker, NULL);
+  pthread_detach (thr);
+
+  /* Spin until GDB releases us.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+
+  _exit (0);
+}
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
new file mode 100644
index 00000000000..7d57a53827a
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
@@ -0,0 +1,139 @@
+# Copyright 2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check for a race condition where in non-stop mode, the user might
+# have a thread other than the main (original) thread selected and use
+# the 'detach' command.
+#
+# As GDB tries to detach it is possible that the main thread might
+# exit, the main thread is still running due to non-stop mode.
+#
+# GDB used to assume that the main thread would always exist when
+# processing the detach, clearly this isn't the case, and this
+# assumption would lead to assertion failures and segfaults.
+#
+# Triggering the precise timing is pretty hard, we need the main
+# thread to exit after the user has entered the 'detach' command, but
+# before GDB enters the detach implementation and stops all threads,
+# the window of opportunity for this bug is actually tiny.
+#
+# However, we can trigger this bug 100% from Python, as GDB's
+# event-loop only kicks in once we return from a Python function.
+# Thus, if we have a single Python function that causes the main
+# thread to exit, and then calls detach GDB will not have a chance to
+# handle the main thread exiting before entering the detach code.
+
+standard_testfile
+
+require allow_python_tests
+
+if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
+    return -1
+}
+
+# Run the test.  When SPAWN_INFERIOR is true the inferior is started
+# as a separate process which GDB then attaches too.  When
+# SPAWN_INFERIOR is false the inferior is started directly within GDB.
+
+proc run_test { spawn_inferior } {
+    save_vars { ::GDBFLAGS } {
+	append ::GDBFLAGS " -ex \"set non-stop on\""
+	clean_restart $::binfile
+    }
+
+    # Setup the inferior.  When complete the main thread (#1) will
+    # still be running (due to non-stop mode), while the worker thread
+    # (#2) will be stopped.
+    #
+    # There are two setup modes, when SPAWN_INFERIOR is true we span a
+    # separate process and attach to it, after the attach both threads
+    # are stopped, so it is necessary to resume thread #1.
+    #
+    # When SPAWN_INFERIOR is false we just start the inferior within
+    # GDB, in this case we place a breakpoint that will be hit by
+    # thread #2.  When the breakpoint is hit thread #1 will remain
+    # running.
+    if {$spawn_inferior} {
+	set test_spawn_id [spawn_wait_for_attach $::binfile]
+	set testpid [spawn_id_get_pid $test_spawn_id]
+
+	set escapedbinfile  [string_to_regexp $::binfile]
+	gdb_test -no-prompt-anchor "attach $testpid" \
+	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
+	    "attach to the inferior"
+
+	# Attaching to a multi-threaded application in non-stop mode
+	# can result in thread stops being reported after the prompt
+	# is displayed.
+	#
+	# Send a simple command now just to resync the command prompt.
+	gdb_test "p 1 + 2" " = 3"
+
+	# Set thread 1 (the current thread) running again.
+	gdb_test "continue&"
+    } else {
+	if {![runto_main]} {
+	    return -1
+	}
+
+	gdb_breakpoint "breakpt"
+	gdb_continue_to_breakpoint "run to breakpoint"
+    }
+
+    # Switch to thread 2.
+    gdb_test "thread 2" \
+	[multi_line \
+	     "Switching to thread 2\[^\r\n\]*" \
+	     "#0\\s+.*"]
+
+    # Create a Python function that sets a variable in the inferior and
+    # then detaches.  Setting the variable in the inferior will allow the
+    # main thread to exit, we even sleep for a short while in order to
+    # give the inferior a chance to exit.
+    #
+    # However, we don't want GDB to notice the exit before we call detach,
+    # which is why we perform both these actions from a Python function.
+    gdb_test_multiline "Create worker function" \
+	"python" "" \
+	"import time" "" \
+	"def set_and_detach():" "" \
+	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
+	"   time.sleep(1)" "" \
+	"   gdb.execute(\"detach\")" "" \
+	"end" ""
+
+    # The Python function performs two actions, the first causes the
+    # main thread to exit, while the second detaches from the inferior.
+    #
+    # In both cases the stop arrives while GDB is processing the
+    # detach, however, for remote targets GDB doesn't report the stop,
+    # while for local targets GDB does report the stop.
+    if {![gdb_is_target_remote]} {
+	set stop_re "\\\[Thread.*exited\\\]\r\n"
+    } else {
+	set stop_re ""
+    }
+    gdb_test "python set_and_detach()" \
+	"${stop_re}\\\[Inferior.*detached\\\]"
+}
+
+foreach_with_prefix spawn_inferior { true false } {
+    if {$spawn_inferior && ![can_spawn_for_attach]} {
+	# If spawning (and attaching too) a separate inferior is not
+	# supported for the current board, then skip this test.
+	continue
+    }
+
+    run_test $spawn_inferior
+}
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2273c..3bee8bc8990 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,17 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* In some situations the current process exits, we inform GDB, but
+     before GDB can acknowledge that the process has exited GDB tries to
+     detach from the inferior.  As part of the detach process GDB will
+     remove all breakpoints, which means we can end up here when the
+     current process has already exited and so PROC is nullptr.  In this
+     case just claim we can't find (and so delete) the breakpoint, GDB
+     will ignore this error during detach.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
 

base-commit: 68510906a981d6abd31c51f49b2ec7e18db0a338
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv5] gdb: handle main thread exiting during detach
  2023-09-29 13:57       ` [PATCHv4] " Andrew Burgess
@ 2023-10-18  9:49         ` Andrew Burgess
  2023-10-25 21:40           ` Kevin Buettner
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-10-18  9:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

I'm planning to merge this patch in the near future.  I'm running into
other bugs that are fixed by the remote.c and gdbserver/ changes
contained in this patch.

Do let me know if there are any objections.

Thanks,
Andrew

---

Changes since v4:

  - Rebased to current master,

  - Fixed a small style issue in the text case (added a missing {}
    around an if condition).

Changes since v3:

  - Rebased to current master,

  - Reworded a comment in gdbserver/mem-break.cc, reformatted the code
    in gdb/remote.c, but neither are functional changes.

Changes since v2:

  - In remote_target::remote_detach_pid, I now use:

      remote_notif_get_pending_events (&notif_client_stop);
      for (stop_reply_up &reply : rs->stop_reply_queue)
        ...

    To look for pending stop events, rather than accessing the pending
    event queue directly.  This ensures that GDB sends the ACK of any
    pending stop events first before making use of them, and also is
    more inline with how we peek at events in other places in remote.c.

  - Read through the commit message and reworded some parts of it to
    make things clearer.

Changes since v1:

  - Rebased to current upstream/master,

  - Extended a gdb_assert call, and added an additional gdb_assert
    call in linux-nat.c,

  - Added an extra comment in linux-nat.c,

  - Retested.

---

Overview
========

Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped.  The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.

This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.

Native Linux Target
===================

For the native Linux target, detaching is handled in the function
linux_nat_target::detach.  In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached.  I fix this by changing
the assert to allow for 0 or 1 lwps at this point.  As the 0 case can
only happen in non-stop mode, the assert becomes:

  gdb_assert (num_lwps (pid) == 1
	      || (target_is_non_stop_p () && num_lwps (pid) == 0));

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr.  In the case
that the main thread has exited though, main_lwp will be nullptr.

However, we only need main_lwp so that GDB can detach from the
thread.  If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.

For Remote Targets
==================

On remote targets there are two problems.

First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach.  The 'set debug remote on' trace
looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.  At the point of failure,
the gdbserver backtrace looks like this:

  #0  0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
  #1  0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
  #2  0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
  #3  0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
  ...

The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent.  Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach.  In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.
---
 gdb/linux-nat.c                               |  24 ++-
 gdb/remote.c                                  |  27 +++-
 .../main-thread-exit-during-detach.c          |  61 ++++++++
 .../main-thread-exit-during-detach.exp        | 140 ++++++++++++++++++
 gdbserver/mem-break.cc                        |  11 ++
 5 files changed, 255 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 9148dda4aad..1c9756c18bd 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1420,10 +1420,12 @@ linux_nat_target::detach (inferior *inf, int from_tty)
 
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
-  /* Only the initial process should be left right now.  */
-  gdb_assert (num_lwps (pid) == 1);
-
-  main_lwp = find_lwp_pid (ptid_t (pid));
+  /* We have detached from everything except the main thread now, so
+     should only have one thread left.  However, in non-stop mode the
+     main thread might have exited, in which case we'll have no threads
+     left.  */
+  gdb_assert (num_lwps (pid) == 1
+	      || (target_is_non_stop_p () && num_lwps (pid) == 0));
 
   if (forks_exist_p ())
     {
@@ -1437,10 +1439,18 @@ linux_nat_target::detach (inferior *inf, int from_tty)
     {
       target_announce_detach (from_tty);
 
-      /* Pass on any pending signal for the last LWP.  */
-      int signo = get_detach_signal (main_lwp);
+      /* In non-stop mode it is possible that the main thread has exited,
+	 in which case we don't try to detach.  */
+      main_lwp = find_lwp_pid (ptid_t (pid));
+      if (main_lwp != nullptr)
+	{
+	  /* Pass on any pending signal for the last LWP.  */
+	  int signo = get_detach_signal (main_lwp);
 
-      detach_one_lwp (main_lwp, &signo);
+	  detach_one_lwp (main_lwp, &signo);
+	}
+      else
+	gdb_assert (target_is_non_stop_p ());
 
       detach_success (inf);
     }
diff --git a/gdb/remote.c b/gdb/remote.c
index 961061e02a9..d51112092dd 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6165,7 +6165,32 @@ remote_target::remote_detach_pid (int pid)
   else if (rs->buf[0] == '\0')
     error (_("Remote doesn't know how to detach"));
   else
-    error (_("Can't detach process."));
+    {
+      /* It is possible that we have an unprocessed exit event for this
+	 pid.  If this is the case then we can ignore the failure to detach
+	 and just pretend that the detach worked, as far as the user is
+	 concerned, the process exited immediately after the detach.  */
+      bool process_has_already_exited = false;
+      remote_notif_get_pending_events (&notif_client_stop);
+      for (stop_reply_up &reply : rs->stop_reply_queue)
+	{
+	  if (reply->ptid.pid () != pid)
+	    continue;
+
+	  enum target_waitkind kind = reply->ws.kind ();
+	  if (kind == TARGET_WAITKIND_EXITED
+	      || kind == TARGET_WAITKIND_SIGNALLED)
+	    {
+	      process_has_already_exited = true;
+	      remote_debug_printf
+		("detach failed, but process already exited");
+	      break;
+	    }
+	}
+
+      if (!process_has_already_exited)
+	error (_("can't detach process: %s"), (char *) rs->buf.data ());
+    }
 }
 
 /* This detaches a program to which we previously attached, using
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
new file mode 100644
index 00000000000..5335842cfe8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
@@ -0,0 +1,61 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/* This is set from GDB to allow the main thread to exit.  */
+
+volatile int dont_exit_just_yet = 1;
+
+/* Somewhere to place a breakpoint.  */
+
+void
+breakpt ()
+{
+  /* Just spin.  When the test is started under GDB we never enter the spin
+     loop, but when we attach, the worker thread will be spinning here.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+}
+
+/* Thread function, doesn't do anything but hit a breakpoint.  */
+
+void *
+thread_worker (void *arg)
+{
+  breakpt ();
+  return NULL;
+}
+
+int
+main ()
+{
+  pthread_t thr;
+
+  alarm (300);
+
+  /* Create a thread.  */
+  pthread_create (&thr, NULL, thread_worker, NULL);
+  pthread_detach (thr);
+
+  /* Spin until GDB releases us.  */
+  while (dont_exit_just_yet)
+    sleep (1);
+
+  _exit (0);
+}
diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
new file mode 100644
index 00000000000..83f5e855fb0
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
@@ -0,0 +1,140 @@
+# Copyright 2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check for a race condition where in non-stop mode, the user might
+# have a thread other than the main (original) thread selected and use
+# the 'detach' command.
+#
+# As GDB tries to detach it is possible that the main thread might
+# exit, the main thread is still running due to non-stop mode.
+#
+# GDB used to assume that the main thread would always exist when
+# processing the detach, clearly this isn't the case, and this
+# assumption would lead to assertion failures and segfaults.
+#
+# Triggering the precise timing is pretty hard, we need the main
+# thread to exit after the user has entered the 'detach' command, but
+# before GDB enters the detach implementation and stops all threads,
+# the window of opportunity for this bug is actually tiny.
+#
+# However, we can trigger this bug 100% from Python, as GDB's
+# event-loop only kicks in once we return from a Python function.
+# Thus, if we have a single Python function that causes the main
+# thread to exit, and then calls detach GDB will not have a chance to
+# handle the main thread exiting before entering the detach code.
+
+standard_testfile
+
+require allow_python_tests
+
+if {[build_executable "failed to prepare" $testfile $srcfile \
+	 {debug pthreads}] == -1} {
+    return -1
+}
+
+# Run the test.  When SPAWN_INFERIOR is true the inferior is started
+# as a separate process which GDB then attaches too.  When
+# SPAWN_INFERIOR is false the inferior is started directly within GDB.
+
+proc run_test { spawn_inferior } {
+    save_vars { ::GDBFLAGS } {
+	append ::GDBFLAGS " -ex \"set non-stop on\""
+	clean_restart $::binfile
+    }
+
+    # Setup the inferior.  When complete the main thread (#1) will
+    # still be running (due to non-stop mode), while the worker thread
+    # (#2) will be stopped.
+    #
+    # There are two setup modes, when SPAWN_INFERIOR is true we span a
+    # separate process and attach to it, after the attach both threads
+    # are stopped, so it is necessary to resume thread #1.
+    #
+    # When SPAWN_INFERIOR is false we just start the inferior within
+    # GDB, in this case we place a breakpoint that will be hit by
+    # thread #2.  When the breakpoint is hit thread #1 will remain
+    # running.
+    if {$spawn_inferior} {
+	set test_spawn_id [spawn_wait_for_attach $::binfile]
+	set testpid [spawn_id_get_pid $test_spawn_id]
+
+	set escapedbinfile  [string_to_regexp $::binfile]
+	gdb_test -no-prompt-anchor "attach $testpid" \
+	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
+	    "attach to the inferior"
+
+	# Attaching to a multi-threaded application in non-stop mode
+	# can result in thread stops being reported after the prompt
+	# is displayed.
+	#
+	# Send a simple command now just to resync the command prompt.
+	gdb_test "p 1 + 2" " = 3"
+
+	# Set thread 1 (the current thread) running again.
+	gdb_test "continue&"
+    } else {
+	if {![runto_main]} {
+	    return -1
+	}
+
+	gdb_breakpoint "breakpt"
+	gdb_continue_to_breakpoint "run to breakpoint"
+    }
+
+    # Switch to thread 2.
+    gdb_test "thread 2" \
+	[multi_line \
+	     "Switching to thread 2\[^\r\n\]*" \
+	     "#0\\s+.*"]
+
+    # Create a Python function that sets a variable in the inferior and
+    # then detaches.  Setting the variable in the inferior will allow the
+    # main thread to exit, we even sleep for a short while in order to
+    # give the inferior a chance to exit.
+    #
+    # However, we don't want GDB to notice the exit before we call detach,
+    # which is why we perform both these actions from a Python function.
+    gdb_test_multiline "Create worker function" \
+	"python" "" \
+	"import time" "" \
+	"def set_and_detach():" "" \
+	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
+	"   time.sleep(1)" "" \
+	"   gdb.execute(\"detach\")" "" \
+	"end" ""
+
+    # The Python function performs two actions, the first causes the
+    # main thread to exit, while the second detaches from the inferior.
+    #
+    # In both cases the stop arrives while GDB is processing the
+    # detach, however, for remote targets GDB doesn't report the stop,
+    # while for local targets GDB does report the stop.
+    if {![gdb_is_target_remote]} {
+	set stop_re "\\\[Thread.*exited\\\]\r\n"
+    } else {
+	set stop_re ""
+    }
+    gdb_test "python set_and_detach()" \
+	"${stop_re}\\\[Inferior.*detached\\\]"
+}
+
+foreach_with_prefix spawn_inferior { true false } {
+    if {$spawn_inferior && ![can_spawn_for_attach]} {
+	# If spawning (and attaching too) a separate inferior is not
+	# supported for the current board, then skip this test.
+	continue
+    }
+
+    run_test $spawn_inferior
+}
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2273c..3bee8bc8990 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,17 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* In some situations the current process exits, we inform GDB, but
+     before GDB can acknowledge that the process has exited GDB tries to
+     detach from the inferior.  As part of the detach process GDB will
+     remove all breakpoints, which means we can end up here when the
+     current process has already exited and so PROC is nullptr.  In this
+     case just claim we can't find (and so delete) the breakpoint, GDB
+     will ignore this error during detach.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
 

base-commit: 5d4a870e05ac45e3f5a301c672a4079995b5db7a
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5] gdb: handle main thread exiting during detach
  2023-10-18  9:49         ` [PATCHv5] " Andrew Burgess
@ 2023-10-25 21:40           ` Kevin Buettner
  2023-10-26 17:26             ` Andrew Burgess
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Buettner @ 2023-10-25 21:40 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

On Wed, 18 Oct 2023 10:49:44 +0100
Andrew Burgess <aburgess@redhat.com> wrote:

>  gdb/linux-nat.c                               |  24 ++-
>  gdb/remote.c                                  |  27 +++-
>  .../main-thread-exit-during-detach.c          |  61 ++++++++
>  .../main-thread-exit-during-detach.exp        | 140 ++++++++++++++++++
>  gdbserver/mem-break.cc                        |  11 ++
>  5 files changed, 255 insertions(+), 8 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp

Thanks for the detailed commit log!  It made the patch fairly easy to review.

Tested-by: Kevin Buettner <kevinb@redhat.com>
Approved-by: Kevin Buettner <kevinb@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5] gdb: handle main thread exiting during detach
  2023-10-25 21:40           ` Kevin Buettner
@ 2023-10-26 17:26             ` Andrew Burgess
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-10-26 17:26 UTC (permalink / raw)
  To: Kevin Buettner; +Cc: gdb-patches

Kevin Buettner <kevinb@redhat.com> writes:

> On Wed, 18 Oct 2023 10:49:44 +0100
> Andrew Burgess <aburgess@redhat.com> wrote:
>
>>  gdb/linux-nat.c                               |  24 ++-
>>  gdb/remote.c                                  |  27 +++-
>>  .../main-thread-exit-during-detach.c          |  61 ++++++++
>>  .../main-thread-exit-during-detach.exp        | 140 ++++++++++++++++++
>>  gdbserver/mem-break.cc                        |  11 ++
>>  5 files changed, 255 insertions(+), 8 deletions(-)
>>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
>>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
>
> Thanks for the detailed commit log!  It made the patch fairly easy to review.
>
> Tested-by: Kevin Buettner <kevinb@redhat.com>
> Approved-by: Kevin Buettner <kevinb@redhat.com>

Pushed.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-10-26 17:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 14:31 [PATCH 0/2] Exit during detach Andrew Burgess
2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
2023-07-19 15:01   ` Tom Tromey
2023-08-03  9:07     ` Andrew Burgess
2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
2023-07-18 14:40   ` Andrew Burgess
2023-08-03  9:50   ` [PATCHv2] " Andrew Burgess
2023-08-23 11:26     ` [PATCHv3] " Andrew Burgess
2023-09-29 13:57       ` [PATCHv4] " Andrew Burgess
2023-10-18  9:49         ` [PATCHv5] " Andrew Burgess
2023-10-25 21:40           ` Kevin Buettner
2023-10-26 17:26             ` Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).