public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Kevin Buettner <kevinb@redhat.com>, gdb-patches@sourceware.org
Cc: Kevin Buettner <kevinb@redhat.com>
Subject: Re: [PATCH 1/2] linux-nat.c, linux-fork.c: Fix detach bug when lwp has exited/terminated
Date: Mon, 13 Nov 2023 10:58:12 +0000	[thread overview]
Message-ID: <87il65sywr.fsf@redhat.com> (raw)
In-Reply-To: <20231111223046.109727-2-kevinb@redhat.com>

Kevin Buettner <kevinb@redhat.com> writes:

> When using GDB on native linux, it can happen that, while attempting
> to detach an inferior, the inferior may have been exited or have been
> killed, yet still be in the list of lwps.  Should that happen, the
> assert in x86_linux_update_debug_registers in
> gdb/nat/x86-linux-dregs.c will trigger.  The line in question looks
> like this:
>
>   gdb_assert (lwp_is_stopped (lwp));
>
> For this case, the lwp isn't stopped - it's dead.
>
> The bug which brought this problem to my attention is one in which the
> pwntools library uses GDB to to debug a process; as the script is
> shutting things down, it kills the process that GDB is debugging and
> also sends GDB a SIGTERM signal, which causes GDB to detach all
> inferiors prior to exiting.  Here's a link to the bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2192169
>
> The following shell command mimics part of what the pwntools
> reproducer script does (with regard to shutting things down), but
> reproduces the bug much less reliably.  I have found it necessary to
> run the command a bunch of times before seeing the bug.  (I usually
> see it within 5-10 repetitions.)  If you choose to try this command,
> make sure that you have no running "cat" or "gdb" processes first!
>
>   cat </dev/zero >/dev/null & \
>   (sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \
>   sleep 1 ; \
>   gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \
>       -ex c /usr/bin/cat `pgrep cat`
>
> So, basically, the idea here is to kill both gdb and cat at roughly
> the same time.  If we happen to attempt the detach before the process
> lwp has been deleted from GDB's (linux native) LWP data structures,
> then the assert will trigger.  The relevant part of the backtrace
> looks like this:
>
>   #8  0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280)
>       at gdb/nat/x86-linux-dregs.c:146
>   #9  0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280)
>       at gdb/nat/x86-linux.c:81
>   #10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume (
>       this=0x121eee0 <the_amd64_linux_nat_target>, lwp=0x1873280)
>       at gdb/x86-linux-nat.h:70
>   #11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c)
>       at gdb/linux-nat.c:1374
>   #12 0x000000000081a85f in linux_nat_target::detach (
>       this=0x121eee0 <the_amd64_linux_nat_target>, inf=0x16e8f70, from_tty=0)
>       at gdb/linux-nat.c:1450
>   #13 0x000000000083a23b in thread_db_target::detach (
>       this=0x1206ae0 <the_thread_db_target>, inf=0x16e8f70, from_tty=0)
>       at gdb/linux-thread-db.c:1385
>   #14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0)
>       at gdb/target.c:2526
>   #15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0)
>       at gdb/top.c:1659
>   #16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0)
>       at gdb/top.c:1762
>   #17 0x000000000070829c in async_sigterm_handler (arg=0x0)
>       at gdb/event-top.c:1141
>
> My colleague, Andrew Burgess, has done some recent work on other
> problems with detach.  Upon hearing of this problem, he came up a test
> case which reliably reproduces the problem and tests for a few other
> problems as well.  In addition to testing detach when the inferior has
> terminated due to a signal, it also tests detach when the inferior has
> exited normally.  Andrew observed that the linux-native-only
> "checkpoint" command would be affected too, so the test also tests
> those cases when there's an active checkpoint.
>
> For the LWP exit / termination case with no checkpoint, that's handled
> via newly added checks of the waitstatus in detach_one_lwp in
> linux-nat.c.
>
> For the checkpoint detach problem, I chose to pass the lwp_info
> to linux_fork_detach in linux-fork.c.  With that in place, suitable
> tests were added before attempting a PTRACE_DETACH operation.
>
> I added a few asserts at the beginning of linux_fork_detach and
> modified the caller code so that the newly added asserts shouldn't
> trigger.  (That's what the 'pid == inferior_ptid.pid' check is about
> in gdb/linux-nat.c.)
>
> Finally, the #include for linux-fork.h had to be moved after that
> of linux-nat.c so that the type of lwp_info would be known.
>
> Lastly, I'll note that the checkpoint code needs some work with
> regard to background execution.  This patch doesn't attempt to
> fix that problem, but it doesn't make it any worse.  It does
> slightly improve the situation with detach because, due to the
> check noted above, linux_fork_detach() won't be called for the
> wrong inferior when there are multiple inferiors.  (I haven't
> checked closely, but I don't think that the rest of the checkpoint
> code works correctly when there are multiple inferiors.)
> ---
>  gdb/linux-fork.c | 22 ++++++++++++++++------
>  gdb/linux-fork.h |  2 +-
>  gdb/linux-nat.c  | 17 +++++++++++++++--
>  3 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c
> index 52e385411c7..73d086938bd 100644
> --- a/gdb/linux-fork.c
> +++ b/gdb/linux-fork.c
> @@ -25,13 +25,14 @@
>  #include "gdbcmd.h"
>  #include "infcall.h"
>  #include "objfiles.h"
> -#include "linux-fork.h"
>  #include "linux-nat.h"
> +#include "linux-fork.h"

This change is only needed so that linux-fork.h sees the declaration of
lwp_info from linux-nat.h.

I'm firmly of the opinion that a header file should pull in all of its
dependencies, rather than relying on the #include order of its users.

So, instead of this change, I think you should really just add:

  struct lwp_info;

to linux-fork.h.

>  #include "gdbthread.h"
>  #include "source.h"
>  
>  #include "nat/gdb_ptrace.h"
>  #include "gdbsupport/gdb_wait.h"
> +#include "target/waitstatus.h"
>  #include <dirent.h>
>  #include <ctype.h>
>  
> @@ -361,15 +362,24 @@ linux_fork_mourn_inferior (void)
>     the first available.  */
>  
>  void
> -linux_fork_detach (int from_tty)
> +linux_fork_detach (int from_tty, lwp_info *lp)
>  {
> +  gdb_assert (lp != nullptr);
> +  gdb_assert (lp->ptid == inferior_ptid);
> +
>    /* OK, inferior_ptid is the one we are detaching from.  We need to
>       delete it from the fork_list, and switch to the next available
> -     fork.  */
> +     fork.  But before doing the detach, do make sure that the lwp
> +     hasn't exited or been terminated first.  */
>  
> -  if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0))
> -    error (_("Unable to detach %s"),
> -	   target_pid_to_str (inferior_ptid).c_str ());
> +  if (lp->waitstatus.kind () != TARGET_WAITKIND_EXITED
> +      && lp->waitstatus.kind () != TARGET_WAITKIND_THREAD_EXITED
> +      && lp->waitstatus.kind () != TARGET_WAITKIND_SIGNALLED)
> +    {
> +      if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0))
> +	error (_("Unable to detach %s"),
> +	       target_pid_to_str (inferior_ptid).c_str ());
> +    }
>  
>    delete_fork (inferior_ptid);
>  
> diff --git a/gdb/linux-fork.h b/gdb/linux-fork.h
> index 5a593fca91e..6e8f5ab7d0b 100644
> --- a/gdb/linux-fork.h
> +++ b/gdb/linux-fork.h
> @@ -25,7 +25,7 @@ extern void add_fork (pid_t);
>  extern struct fork_info *find_fork_pid (pid_t);
>  extern void linux_fork_killall (void);
>  extern void linux_fork_mourn_inferior (void);
> -extern void linux_fork_detach (int);
> +extern void linux_fork_detach (int, lwp_info *);
>  extern int forks_exist_p (void);
>  extern int linux_fork_checkpointing_p (int);
>  
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 1c9756c18bd..bf2da572de0 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1354,6 +1354,19 @@ detach_one_lwp (struct lwp_info *lp, int *signo_p)
>        lp->signalled = 0;
>      }
>  
> +  /* If the lwp has exited or was terminated due to a signal, there's
> +     nothing left to do.  */
> +  if (lp->waitstatus.kind () == TARGET_WAITKIND_EXITED
> +      || lp->waitstatus.kind () == TARGET_WAITKIND_THREAD_EXITED
> +      || lp->waitstatus.kind () == TARGET_WAITKIND_SIGNALLED)
> +    {
> +      linux_nat_debug_printf
> +        ("Can't detach %s - it has exited or was terminated: %s.",

This line should be indented with a leading <tab>.  The
binutils-gdb/.gitattributes file should result in this issue being
highlighted if you 'git show' this patch.

> +	 lp->ptid.to_string ().c_str (),
> +	 lp->waitstatus.to_string ().c_str ());
> +      return;

I wonder if we should be calling `delete_lwp (lp->ptid);` before calling
return here?

I suspect that, at least in the test case you have, that this doesn't
really matter, we'll likely end up calling purge_lwp_list via
linux_nat_target::mourn_inferior ... but it still feels like we should
be making the call.

> +    }
> +
>    if (signo_p == NULL)
>      {
>        /* Pass on any pending signal for this LWP.  */
> @@ -1427,13 +1440,13 @@ linux_nat_target::detach (inferior *inf, int from_tty)
>    gdb_assert (num_lwps (pid) == 1
>  	      || (target_is_non_stop_p () && num_lwps (pid) == 0));
>  
> -  if (forks_exist_p ())
> +  if (pid == inferior_ptid.pid () && forks_exist_p ())

I see how this ties to the assert in linux_fork_detach.  And given the
comments and use of inferior_ptid, I can see why you added the assert.

But I guess you added the check because some test triggered the assert.
Do you recall which test that was?  I guess I'm curious what PID we are
trying to detach from when this condition triggers.

Thanks,
Andrew

>      {
>        /* Multi-fork case.  The current inferior_ptid is being detached
>  	 from, but there are other viable forks to debug.  Detach from
>  	 the current fork, and context-switch to the first
>  	 available.  */
> -      linux_fork_detach (from_tty);
> +      linux_fork_detach (from_tty, find_lwp_pid (ptid_t (pid)));
>      }
>    else
>      {
> -- 
> 2.41.0


  reply	other threads:[~2023-11-13 10:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-11 22:26 [PATCH 0/2] " Kevin Buettner
2023-11-11 22:26 ` [PATCH 1/2] linux-nat.c, linux-fork.c: " Kevin Buettner
2023-11-13 10:58   ` Andrew Burgess [this message]
2023-11-14  1:36     ` Kevin Buettner
2023-11-14  5:48       ` Kevin Buettner
2023-11-11 22:26 ` [PATCH 2/2] New test: gdb.base/process-dies-while-detaching.exp Kevin Buettner
2023-11-13 11:04   ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87il65sywr.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=kevinb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).