From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Subject: Re: [PATCH 2/2] gdb: handle main thread exiting during detach
Date: Tue, 18 Jul 2023 15:40:14 +0100 [thread overview]
Message-ID: <87fs5lgu5t.fsf@redhat.com> (raw)
In-Reply-To: <19e8fbcdf634198ed3ab5dd3ee2e2b36c2353e27.1689690655.git.aburgess@redhat.com>
Andrew Burgess <aburgess@redhat.com> writes:
> Overview
> ========
>
> In non-stop mode, the main thread is running while a second thread is
> stopped. The user has the second thread selected as the current
> thread and asks GDB to detach. At the exact moment of detach the main
> thread exits. This situation causes crashes, assertion failures, and
> unexpected errors to be reported from GDB for both native and remote
> targets.
>
> This commit addresses this situation for native and remote targets,
> but the fixes are all different.
>
> Native Linux Target
> ===================
>
> This is caught in linux_nat_target::detach and the stop_wait_callback
> notices that the thread has exited and cleans up.
>
> GDB then detaches from everything except the main thread by calling
> detach_callback.
>
> After this the first problem is this assert:
>
> /* Only the initial process should be left right now. */
> gdb_assert (num_lwps (pid) == 1);
>
> The num_lwps call will return 0 as the main thread has exited and all
> of the other threads have been detached. I fix this by changing the
> assert too:
>
> gdb_assert (num_lwps (pid) <= 1);
>
> and improving the comment.
>
> The next problem is that we do:
>
> main_lwp = find_lwp_pid (ptid_t (pid));
>
> and then proceed assuming that main_lwp is not nullptr, however, as
> the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr,
> and so we crash trying to dereference the nullptr. I fix this by
> adding a nullptr check for main_lwp. I think this is fine because the
> actions we are skipping all relate to detaching from the main thread,
> but we'll not be doing this as the thread has already exited!
>
> For Remote Targets
> ==================
>
> On remote targets there are two problems. The first is that when the
> exit occurs during the early phase of the detach, we see the stop
> notification arrive while GDB is removing the breakpoints ahead of the
> detach. The 'set debug remote on' trace looks like this:
>
> [remote] Sending packet: $z0,7f1648fe0241,1#35
> [remote] Notification received: Stop:W0;process:2a0ac8
> # At this point an unpatched gdbserver segfaults, and the connection
> # is broken. A patched gdbserver continues as below...
> [remote] Packet received: E01
> [remote] Sending packet: $z0,7f1648ff00a8,1#68
> [remote] Packet received: E01
> [remote] Sending packet: $z0,7f1648ff132f,1#6b
> [remote] Packet received: E01
> [remote] Sending packet: $D;2a0ac8#3e
> [remote] Packet received: E01
>
> I was originally running into Segmentation Faults, from within
> gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This
> function calls current_process() and then dereferences the result to
> find the breakpoint list.
>
> However, in our case, the current process has already exited, and so
> the current_process() call returns nullptr.
>
> It seems reasonable that in this case gdbserver should report a
> failure to remove the breakpoint, and this is what I've done -- by
> adding a nullptr check within find_gdb_breakpoint.
>
> The second problem I ran into was on the gdb side, as the process has
> already exited, but GDB has not yet acknowledged the exit event, the
> detach -- the 'D' packet in the above trace -- fails. This was being
> reported to the user with a 'Can't detach process' error. As the test
> actually calls detach from Python code, this error was then becoming a
> Python exception.
>
> Though clearly the detach has returned an error, and so, maybe, having
> GDB throw an error would be fine, I think in this case, there's a good
> argument that the remote error should be ignored -- if GDB tries to
> detach and gets back an error, and if there's a pending exit event for
> the pid we tried to detach, then just ignore the error and pretend the
> detach worked fine.
>
> Testing
> =======
>
> In order to test this issue I needed to ensure that the exit event
> arrives at the same time as the detach call. The window of
> opportunity for getting the exit to arrive is so small I've never
> managed to trigger this in real use.
>
> However, if we trigger both the exit and the detach from a single
> Python function then we never return to GDB's event loop, as such GDB
> never processes the exit event, and so the first time GDB gets a
> chance to see the exit is during the detach call. And so that is the
> approach I've taken for testing this patch.
> ---
> gdb/linux-nat.c | 19 ++-
> gdb/remote.c | 19 ++-
> .../main-thread-exit-during-detach.c | 59 ++++++++
> .../main-thread-exit-during-detach.exp | 143 ++++++++++++++++++
> gdbserver/mem-break.cc | 7 +
> 5 files changed, 239 insertions(+), 8 deletions(-)
> create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
>
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 73008a313c0..c3d62e0f85e 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1426,10 +1426,11 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>
> iterate_over_lwps (ptid_t (pid), detach_callback);
>
> - /* Only the initial process should be left right now. */
> - gdb_assert (num_lwps (pid) == 1);
> -
> - main_lwp = find_lwp_pid (ptid_t (pid));
> + /* We have detached from everything except the main thread now, so
> + should only have one thread left. However, in non-stop mode the
> + main thread might have exited, in which case we'll have no threads
> + left. */
> + gdb_assert (num_lwps (pid) <= 1);
>
> if (forks_exist_p ())
> {
> @@ -1443,10 +1444,14 @@ linux_nat_target::detach (inferior *inf, int from_tty)
> {
> target_announce_detach (from_tty);
>
> - /* Pass on any pending signal for the last LWP. */
> - int signo = get_detach_signal (main_lwp);
> + main_lwp = find_lwp_pid (ptid_t (pid));
> + if (main_lwp != nullptr)
> + {
> + /* Pass on any pending signal for the last LWP. */
> + int signo = get_detach_signal (main_lwp);
>
> - detach_one_lwp (main_lwp, &signo);
> + detach_one_lwp (main_lwp, &signo);
> + }
>
> detach_success (inf);
> }
> diff --git a/gdb/remote.c b/gdb/remote.c
> index ff3d7e5cd32..f61035a26b0 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid)
> else if (rs->buf[0] == '\0')
> error (_("Remote doesn't know how to detach"));
> else
> - error (_("Can't detach process."));
> + {
> + /* It is possible that we have an unprocessed exit event for this
> + pid. If this is the case then we can ignore the failure to detach
> + and just pretend that the detach worked, as far as the user is
> + concerned, the process exited immediately after the detach. */
> + bool process_has_already_exited = false;
> + struct remote_notif_state *rns = rs->notif_state;
> + auto reply
> + = (struct stop_reply *) rns->pending_event[notif_client_stop.id];
> + if (reply != nullptr && reply->ptid.pid () == pid)
> + {
> + if (reply->ws.kind () == TARGET_WAITKIND_EXITED)
> + process_has_already_exited = true;
> + }
> +
> + if (!process_has_already_exited)
> + error (_("Can't detach process: %s"), (char *) rs->buf.data ());
> + }
> }
>
> /* This detaches a program to which we previously attached, using
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> new file mode 100644
> index 00000000000..ecd0138c354
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> @@ -0,0 +1,59 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2023 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +/* This is set from GDB to allow the main thread to exit. */
> +
> +volatile int dont_exit_just_yet = 1;
> +
> +/* Somewhere to place a breakpoint. */
> +
> +void
> +breakpt ()
> +{
> + /* Just spin. When the test is started under GDB we never enter the spin
> + loop, but when we attach, the worker thread will be spinning here. */
> + while (dont_exit_just_yet)
> + sleep (1);
> +}
> +
> +/* Thread function, doesn't do anything but hit a breakpoint. */
> +
> +void *
> +thread_worker (void *arg)
> +{
> + breakpt ();
> + return NULL;
> +}
> +
> +int
> +main ()
> +{
> + pthread_t thr;
> +
This test program is missing an 'alarm (300);' call, which I've now
added just here. Apologies for missing this before posting.
Thanks,
Andrew
> + /* Create a thread. */
> + pthread_create (&thr, NULL, thread_worker, NULL);
> + pthread_detach (thr);
> +
> + /* Spin until GDB releases us. */
> + while (dont_exit_just_yet)
> + sleep (1);
> +
> + _exit (0);
> +}
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> new file mode 100644
> index 00000000000..97df3bd2783
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> @@ -0,0 +1,143 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# Check for a race condition where in non-stop mode, the user might
> +# have a thread other than the main (original) thread selected and use
> +# the 'detach' command.
> +#
> +# As GDB tries to detach it is possible that the main thread might
> +# exit, the main thread is still running due to non-stop mode.
> +#
> +# GDB used to assume that the main thread would always exist when
> +# processing the detach, clearly this isn't the case, and this
> +# assumption would lead to assertion failures and segfaults.
> +#
> +# Triggering the precise timing is pretty hard, we need the main
> +# thread to exit after the user has entered the 'detach' command, but
> +# before GDB enters the detach implementation and stops all threads,
> +# the window of opportunity for this bug is actually tiny.
> +#
> +# However, we can trigger this bug 100% from Python, as GDB's
> +# event-loop only kicks in once we return from a Python function.
> +# Thus, if we have a single Python function that causes the main
> +# thread to exit, and then calls detach GDB will not have a chance to
> +# handle the main thread exiting before entering the detach code.
> +
> +standard_testfile
> +
> +require allow_python_tests
> +
> +if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
> + return -1
> +}
> +
> +# Run the test. When SPAWN_INFERIOR is true the inferior is started
> +# as a separate process which GDB then attaches too. When
> +# SPAWN_INFERIOR is false the inferior is started directly within GDB.
> +
> +proc run_test { spawn_inferior } {
> + global srcfile
> + global gdb_prompt
> + global GDBFLAGS
> +
> + save_vars { GDBFLAGS } {
> + append GDBFLAGS " -ex \"set non-stop on\""
> + clean_restart $::binfile
> + }
> +
> + # Setup the inferior. When complete the main thread (#1) will
> + # still be running (due to non-stop mode), while the worker thread
> + # (#2) will be stopped.
> + #
> + # There are two setup modes, when SPAWN_INFERIOR is true we span a
> + # separate process and attach to it, after the attach both threads
> + # are stopped, so it is necessary to resume thread #1.
> + #
> + # When SPAWN_INFERIOR is false we just start the inferior within
> + # GDB, in this case we place a breakpoint that will be hit by
> + # thread #2. When the breakpoint is hit thread #1 will remain
> + # running.
> + if {$spawn_inferior} {
> + set test_spawn_id [spawn_wait_for_attach $::binfile]
> + set testpid [spawn_id_get_pid $test_spawn_id]
> +
> + set escapedbinfile [string_to_regexp $::binfile]
> + gdb_test -no-prompt-anchor "attach $testpid" \
> + "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
> + "attach to the inferior"
> +
> + # Attaching to a multi-threaded application in non-stop mode
> + # can result in thread stops being reported after the prompt
> + # is displayed.
> + #
> + # Send a simple command now just to resync the command prompt.
> + gdb_test "p 1 + 2" " = 3"
> +
> + # Set thread 1 (the current thread) running again.
> + gdb_test "continue&"
> + } else {
> + if {![runto_main]} {
> + return -1
> + }
> +
> + gdb_breakpoint "breakpt"
> + gdb_continue_to_breakpoint "run to breakpoint"
> + }
> +
> + # Switch to thread 2.
> + gdb_test "thread 2" \
> + [multi_line \
> + "Switching to thread 2\[^\r\n\]*" \
> + "#0\\s+.*"]
> +
> + # Create a Python function that sets a variable in the inferior and
> + # then detaches. Setting the variable in the inferior will allow the
> + # main thread to exit, we even sleep for a short while in order to
> + # give the inferior a chance to exit.
> + #
> + # However, we don't want GDB to notice the exit before we call detach,
> + # which is why we perform both these actions from a Python function.
> + gdb_test_multiline "Create worker function" \
> + "python" "" \
> + "import time" "" \
> + "def set_and_detach():" "" \
> + " gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
> + " time.sleep(1)" "" \
> + " gdb.execute(\"detach\")" "" \
> + "end" ""
> +
> + # The Python function performs two actions, the first causes the
> + # main thread to exit, while the second detaches from the inferior.
> + #
> + # In both cases the stop arrives while GDB is processing the
> + # detach, however, for remote targets GDB doesn't report the stop,
> + # while for local targets GDB does report the stop.
> + if {![gdb_is_target_remote]} {
> + set stop_re "\\\[Thread.*exited\\\]\r\n"
> + } else {
> + set stop_re ""
> + }
> + gdb_test "python set_and_detach()" \
> + "${stop_re}\\\[Inferior.*detached\\\]"
> +}
> +
> +foreach_with_prefix spawn_inferior { true false } {
> + if {$spawn_inferior && ![can_spawn_for_attach]} {
> + # If spawning (and attaching too) a separate inferior is not
> + # supported for the current board, then lets not try too.
> + continue
> + }
> +
> + run_test $spawn_inferior
> +}
> diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
> index 897b9a2273c..12143caf104 100644
> --- a/gdbserver/mem-break.cc
> +++ b/gdbserver/mem-break.cc
> @@ -976,6 +976,13 @@ static struct gdb_breakpoint *
> find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
> {
> struct process_info *proc = current_process ();
> +
> + /* GDB might not yet be aware that the current process has exited and so
> + still requests that we modify (delete) a breakpoint. Just claim we
> + can't find the breakpoint. */
> + if (proc == nullptr)
> + return nullptr;
> +
> struct breakpoint *bp;
> enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
>
> --
> 2.25.4
next prev parent reply other threads:[~2023-07-18 14:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 14:31 [PATCH 0/2] Exit " Andrew Burgess
2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
2023-07-19 15:01 ` Tom Tromey
2023-08-03 9:07 ` Andrew Burgess
2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
2023-07-18 14:40 ` Andrew Burgess [this message]
2023-08-03 9:50 ` [PATCHv2] " Andrew Burgess
2023-08-23 11:26 ` [PATCHv3] " Andrew Burgess
2023-09-29 13:57 ` [PATCHv4] " Andrew Burgess
2023-10-18 9:49 ` [PATCHv5] " Andrew Burgess
2023-10-25 21:40 ` Kevin Buettner
2023-10-26 17:26 ` Andrew Burgess
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fs5lgu5t.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).