Re: [PATCH 2/2] gdb: handle main thread exiting during detach

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Subject: Re: [PATCH 2/2] gdb: handle main thread exiting during detach
Date: Tue, 18 Jul 2023 15:40:14 +0100	[thread overview]
Message-ID: <87fs5lgu5t.fsf@redhat.com> (raw)
In-Reply-To: <19e8fbcdf634198ed3ab5dd3ee2e2b36c2353e27.1689690655.git.aburgess@redhat.com>

Andrew Burgess <aburgess@redhat.com> writes:

> Overview
> ========
>
> In non-stop mode, the main thread is running while a second thread is
> stopped.  The user has the second thread selected as the current
> thread and asks GDB to detach.  At the exact moment of detach the main
> thread exits.  This situation causes crashes, assertion failures, and
> unexpected errors to be reported from GDB for both native and remote
> targets.
>
> This commit addresses this situation for native and remote targets,
> but the fixes are all different.
>
> Native Linux Target
> ===================
>
> This is caught in linux_nat_target::detach and the stop_wait_callback
> notices that the thread has exited and cleans up.
>
> GDB then detaches from everything except the main thread by calling
> detach_callback.
>
> After this the first problem is this assert:
>
>   /* Only the initial process should be left right now.  */
>   gdb_assert (num_lwps (pid) == 1);
>
> The num_lwps call will return 0 as the main thread has exited and all
> of the other threads have been detached.  I fix this by changing the
> assert too:
>
>   gdb_assert (num_lwps (pid) <= 1);
>
> and improving the comment.
>
> The next problem is that we do:
>
>   main_lwp = find_lwp_pid (ptid_t (pid));
>
> and then proceed assuming that main_lwp is not nullptr, however, as
> the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr,
> and so we crash trying to dereference the nullptr.  I fix this by
> adding a nullptr check for main_lwp.  I think this is fine because the
> actions we are skipping all relate to detaching from the main thread,
> but we'll not be doing this as the thread has already exited!
>
> For Remote Targets
> ==================
>
> On remote targets there are two problems.  The first is that when the
> exit occurs during the early phase of the detach, we see the stop
> notification arrive while GDB is removing the breakpoints ahead of the
> detach.  The 'set debug remote on' trace looks like this:
>
>   [remote] Sending packet: $z0,7f1648fe0241,1#35
>   [remote]   Notification received: Stop:W0;process:2a0ac8
>   # At this point an unpatched gdbserver segfaults, and the connection
>   # is broken.  A patched gdbserver continues as below...
>   [remote] Packet received: E01
>   [remote] Sending packet: $z0,7f1648ff00a8,1#68
>   [remote] Packet received: E01
>   [remote] Sending packet: $z0,7f1648ff132f,1#6b
>   [remote] Packet received: E01
>   [remote] Sending packet: $D;2a0ac8#3e
>   [remote] Packet received: E01
>
> I was originally running into Segmentation Faults, from within
> gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
> function calls current_process() and then dereferences the result to
> find the breakpoint list.
>
> However, in our case, the current process has already exited, and so
> the current_process() call returns nullptr.
>
> It seems reasonable that in this case gdbserver should report a
> failure to remove the breakpoint, and this is what I've done -- by
> adding a nullptr check within find_gdb_breakpoint.
>
> The second problem I ran into was on the gdb side, as the process has
> already exited, but GDB has not yet acknowledged the exit event, the
> detach -- the 'D' packet in the above trace -- fails.  This was being
> reported to the user with a 'Can't detach process' error.  As the test
> actually calls detach from Python code, this error was then becoming a
> Python exception.
>
> Though clearly the detach has returned an error, and so, maybe, having
> GDB throw an error would be fine, I think in this case, there's a good
> argument that the remote error should be ignored -- if GDB tries to
> detach and gets back an error, and if there's a pending exit event for
> the pid we tried to detach, then just ignore the error and pretend the
> detach worked fine.
>
> Testing
> =======
>
> In order to test this issue I needed to ensure that the exit event
> arrives at the same time as the detach call.  The window of
> opportunity for getting the exit to arrive is so small I've never
> managed to trigger this in real use.
>
> However, if we trigger both the exit and the detach from a single
> Python function then we never return to GDB's event loop, as such GDB
> never processes the exit event, and so the first time GDB gets a
> chance to see the exit is during the detach call.  And so that is the
> approach I've taken for testing this patch.
> ---
>  gdb/linux-nat.c                               |  19 ++-
>  gdb/remote.c                                  |  19 ++-
>  .../main-thread-exit-during-detach.c          |  59 ++++++++
>  .../main-thread-exit-during-detach.exp        | 143 ++++++++++++++++++
>  gdbserver/mem-break.cc                        |   7 +
>  5 files changed, 239 insertions(+), 8 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
>  create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
>
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 73008a313c0..c3d62e0f85e 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1426,10 +1426,11 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>  
>    iterate_over_lwps (ptid_t (pid), detach_callback);
>  
> -  /* Only the initial process should be left right now.  */
> -  gdb_assert (num_lwps (pid) == 1);
> -
> -  main_lwp = find_lwp_pid (ptid_t (pid));
> +  /* We have detached from everything except the main thread now, so
> +     should only have one thread left.  However, in non-stop mode the
> +     main thread might have exited, in which case we'll have no threads
> +     left.  */
> +  gdb_assert (num_lwps (pid) <= 1);
>  
>    if (forks_exist_p ())
>      {
> @@ -1443,10 +1444,14 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>      {
>        target_announce_detach (from_tty);
>  
> -      /* Pass on any pending signal for the last LWP.  */
> -      int signo = get_detach_signal (main_lwp);
> +      main_lwp = find_lwp_pid (ptid_t (pid));
> +      if (main_lwp != nullptr)
> +	{
> +	  /* Pass on any pending signal for the last LWP.  */
> +	  int signo = get_detach_signal (main_lwp);
>  
> -      detach_one_lwp (main_lwp, &signo);
> +	  detach_one_lwp (main_lwp, &signo);
> +	}
>  
>        detach_success (inf);
>      }
> diff --git a/gdb/remote.c b/gdb/remote.c
> index ff3d7e5cd32..f61035a26b0 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid)
>    else if (rs->buf[0] == '\0')
>      error (_("Remote doesn't know how to detach"));
>    else
> -    error (_("Can't detach process."));
> +    {
> +      /* It is possible that we have an unprocessed exit event for this
> +	 pid.  If this is the case then we can ignore the failure to detach
> +	 and just pretend that the detach worked, as far as the user is
> +	 concerned, the process exited immediately after the detach.  */
> +      bool process_has_already_exited = false;
> +      struct remote_notif_state *rns = rs->notif_state;
> +      auto reply
> +	= (struct stop_reply *) rns->pending_event[notif_client_stop.id];
> +      if (reply != nullptr && reply->ptid.pid () == pid)
> +	{
> +	  if (reply->ws.kind () == TARGET_WAITKIND_EXITED)
> +	    process_has_already_exited = true;
> +	}
> +
> +      if (!process_has_already_exited)
> +	error (_("Can't detach process: %s"), (char *) rs->buf.data ());
> +    }
>  }
>  
>  /* This detaches a program to which we previously attached, using
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> new file mode 100644
> index 00000000000..ecd0138c354
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c
> @@ -0,0 +1,59 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2023 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +/* This is set from GDB to allow the main thread to exit.  */
> +
> +volatile int dont_exit_just_yet = 1;
> +
> +/* Somewhere to place a breakpoint.  */
> +
> +void
> +breakpt ()
> +{
> +  /* Just spin.  When the test is started under GDB we never enter the spin
> +     loop, but when we attach, the worker thread will be spinning here.  */
> +  while (dont_exit_just_yet)
> +    sleep (1);
> +}
> +
> +/* Thread function, doesn't do anything but hit a breakpoint.  */
> +
> +void *
> +thread_worker (void *arg)
> +{
> +  breakpt ();
> +  return NULL;
> +}
> +
> +int
> +main ()
> +{
> +  pthread_t thr;
> +

This test program is missing an 'alarm (300);' call, which I've now
added just here.  Apologies for missing this before posting.

Thanks,
Andrew

> +  /* Create a thread.  */
> +  pthread_create (&thr, NULL, thread_worker, NULL);
> +  pthread_detach (thr);
> +
> +  /* Spin until GDB releases us.  */
> +  while (dont_exit_just_yet)
> +    sleep (1);
> +
> +  _exit (0);
> +}
> diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> new file mode 100644
> index 00000000000..97df3bd2783
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp
> @@ -0,0 +1,143 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Check for a race condition where in non-stop mode, the user might
> +# have a thread other than the main (original) thread selected and use
> +# the 'detach' command.
> +#
> +# As GDB tries to detach it is possible that the main thread might
> +# exit, the main thread is still running due to non-stop mode.
> +#
> +# GDB used to assume that the main thread would always exist when
> +# processing the detach, clearly this isn't the case, and this
> +# assumption would lead to assertion failures and segfaults.
> +#
> +# Triggering the precise timing is pretty hard, we need the main
> +# thread to exit after the user has entered the 'detach' command, but
> +# before GDB enters the detach implementation and stops all threads,
> +# the window of opportunity for this bug is actually tiny.
> +#
> +# However, we can trigger this bug 100% from Python, as GDB's
> +# event-loop only kicks in once we return from a Python function.
> +# Thus, if we have a single Python function that causes the main
> +# thread to exit, and then calls detach GDB will not have a chance to
> +# handle the main thread exiting before entering the detach code.
> +
> +standard_testfile
> +
> +require allow_python_tests
> +
> +if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] {
> +    return -1
> +}
> +
> +# Run the test.  When SPAWN_INFERIOR is true the inferior is started
> +# as a separate process which GDB then attaches too.  When
> +# SPAWN_INFERIOR is false the inferior is started directly within GDB.
> +
> +proc run_test { spawn_inferior } {
> +    global srcfile
> +    global gdb_prompt
> +    global GDBFLAGS
> +
> +    save_vars { GDBFLAGS } {
> +	append GDBFLAGS " -ex \"set non-stop on\""
> +	clean_restart $::binfile
> +    }
> +
> +    # Setup the inferior.  When complete the main thread (#1) will
> +    # still be running (due to non-stop mode), while the worker thread
> +    # (#2) will be stopped.
> +    #
> +    # There are two setup modes, when SPAWN_INFERIOR is true we span a
> +    # separate process and attach to it, after the attach both threads
> +    # are stopped, so it is necessary to resume thread #1.
> +    #
> +    # When SPAWN_INFERIOR is false we just start the inferior within
> +    # GDB, in this case we place a breakpoint that will be hit by
> +    # thread #2.  When the breakpoint is hit thread #1 will remain
> +    # running.
> +    if {$spawn_inferior} {
> +	set test_spawn_id [spawn_wait_for_attach $::binfile]
> +	set testpid [spawn_id_get_pid $test_spawn_id]
> +
> +	set escapedbinfile  [string_to_regexp $::binfile]
> +	gdb_test -no-prompt-anchor "attach $testpid" \
> +	    "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \
> +	    "attach to the inferior"
> +
> +	# Attaching to a multi-threaded application in non-stop mode
> +	# can result in thread stops being reported after the prompt
> +	# is displayed.
> +	#
> +	# Send a simple command now just to resync the command prompt.
> +	gdb_test "p 1 + 2" " = 3"
> +
> +	# Set thread 1 (the current thread) running again.
> +	gdb_test "continue&"
> +    } else {
> +	if {![runto_main]} {
> +	    return -1
> +	}
> +
> +	gdb_breakpoint "breakpt"
> +	gdb_continue_to_breakpoint "run to breakpoint"
> +    }
> +
> +    # Switch to thread 2.
> +    gdb_test "thread 2" \
> +	[multi_line \
> +	     "Switching to thread 2\[^\r\n\]*" \
> +	     "#0\\s+.*"]
> +
> +    # Create a Python function that sets a variable in the inferior and
> +    # then detaches.  Setting the variable in the inferior will allow the
> +    # main thread to exit, we even sleep for a short while in order to
> +    # give the inferior a chance to exit.
> +    #
> +    # However, we don't want GDB to notice the exit before we call detach,
> +    # which is why we perform both these actions from a Python function.
> +    gdb_test_multiline "Create worker function" \
> +	"python" "" \
> +	"import time" "" \
> +	"def set_and_detach():" "" \
> +	"   gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \
> +	"   time.sleep(1)" "" \
> +	"   gdb.execute(\"detach\")" "" \
> +	"end" ""
> +
> +    # The Python function performs two actions, the first causes the
> +    # main thread to exit, while the second detaches from the inferior.
> +    #
> +    # In both cases the stop arrives while GDB is processing the
> +    # detach, however, for remote targets GDB doesn't report the stop,
> +    # while for local targets GDB does report the stop.
> +    if {![gdb_is_target_remote]} {
> +	set stop_re "\\\[Thread.*exited\\\]\r\n"
> +    } else {
> +	set stop_re ""
> +    }
> +    gdb_test "python set_and_detach()" \
> +	"${stop_re}\\\[Inferior.*detached\\\]"
> +}
> +
> +foreach_with_prefix spawn_inferior { true false } {
> +    if {$spawn_inferior && ![can_spawn_for_attach]} {
> +	# If spawning (and attaching too) a separate inferior is not
> +	# supported for the current board, then lets not try too.
> +	continue
> +    }
> +
> +    run_test $spawn_inferior
> +}
> diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
> index 897b9a2273c..12143caf104 100644
> --- a/gdbserver/mem-break.cc
> +++ b/gdbserver/mem-break.cc
> @@ -976,6 +976,13 @@ static struct gdb_breakpoint *
>  find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
>  {
>    struct process_info *proc = current_process ();
> +
> +  /* GDB might not yet be aware that the current process has exited and so
> +     still requests that we modify (delete) a breakpoint.  Just claim we
> +     can't find the breakpoint.  */
> +  if (proc == nullptr)
> +    return nullptr;
> +
>    struct breakpoint *bp;
>    enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
>  
> -- 
> 2.25.4

next prev parent reply	other threads:[~2023-07-18 14:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18 14:31 [PATCH 0/2] Exit " Andrew Burgess
2023-07-18 14:31 ` [PATCH 1/2] gdb: fix possible nullptr dereference in a remote_debug_printf call Andrew Burgess
2023-07-19 15:01   ` Tom Tromey
2023-08-03  9:07     ` Andrew Burgess
2023-07-18 14:31 ` [PATCH 2/2] gdb: handle main thread exiting during detach Andrew Burgess
2023-07-18 14:40   ` Andrew Burgess [this message]
2023-08-03  9:50   ` [PATCHv2] " Andrew Burgess
2023-08-23 11:26     ` [PATCHv3] " Andrew Burgess
2023-09-29 13:57       ` [PATCHv4] " Andrew Burgess
2023-10-18  9:49         ` [PATCHv5] " Andrew Burgess
2023-10-25 21:40           ` Kevin Buettner
2023-10-26 17:26             ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fs5lgu5t.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).