From: Andrew Burgess <aburgess@redhat.com>
To: Kevin Buettner <kevinb@redhat.com>, gdb-patches@sourceware.org
Cc: Kevin Buettner <kevinb@redhat.com>
Subject: Re: [PATCH 1/2] linux-nat.c, linux-fork.c: Fix detach bug when lwp has exited/terminated
Date: Mon, 13 Nov 2023 10:58:12 +0000 [thread overview]
Message-ID: <87il65sywr.fsf@redhat.com> (raw)
In-Reply-To: <20231111223046.109727-2-kevinb@redhat.com>
Kevin Buettner <kevinb@redhat.com> writes:
> When using GDB on native linux, it can happen that, while attempting
> to detach an inferior, the inferior may have been exited or have been
> killed, yet still be in the list of lwps. Should that happen, the
> assert in x86_linux_update_debug_registers in
> gdb/nat/x86-linux-dregs.c will trigger. The line in question looks
> like this:
>
> gdb_assert (lwp_is_stopped (lwp));
>
> For this case, the lwp isn't stopped - it's dead.
>
> The bug which brought this problem to my attention is one in which the
> pwntools library uses GDB to to debug a process; as the script is
> shutting things down, it kills the process that GDB is debugging and
> also sends GDB a SIGTERM signal, which causes GDB to detach all
> inferiors prior to exiting. Here's a link to the bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2192169
>
> The following shell command mimics part of what the pwntools
> reproducer script does (with regard to shutting things down), but
> reproduces the bug much less reliably. I have found it necessary to
> run the command a bunch of times before seeing the bug. (I usually
> see it within 5-10 repetitions.) If you choose to try this command,
> make sure that you have no running "cat" or "gdb" processes first!
>
> cat </dev/zero >/dev/null & \
> (sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \
> sleep 1 ; \
> gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \
> -ex c /usr/bin/cat `pgrep cat`
>
> So, basically, the idea here is to kill both gdb and cat at roughly
> the same time. If we happen to attempt the detach before the process
> lwp has been deleted from GDB's (linux native) LWP data structures,
> then the assert will trigger. The relevant part of the backtrace
> looks like this:
>
> #8 0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280)
> at gdb/nat/x86-linux-dregs.c:146
> #9 0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280)
> at gdb/nat/x86-linux.c:81
> #10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume (
> this=0x121eee0 <the_amd64_linux_nat_target>, lwp=0x1873280)
> at gdb/x86-linux-nat.h:70
> #11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c)
> at gdb/linux-nat.c:1374
> #12 0x000000000081a85f in linux_nat_target::detach (
> this=0x121eee0 <the_amd64_linux_nat_target>, inf=0x16e8f70, from_tty=0)
> at gdb/linux-nat.c:1450
> #13 0x000000000083a23b in thread_db_target::detach (
> this=0x1206ae0 <the_thread_db_target>, inf=0x16e8f70, from_tty=0)
> at gdb/linux-thread-db.c:1385
> #14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0)
> at gdb/target.c:2526
> #15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0)
> at gdb/top.c:1659
> #16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0)
> at gdb/top.c:1762
> #17 0x000000000070829c in async_sigterm_handler (arg=0x0)
> at gdb/event-top.c:1141
>
> My colleague, Andrew Burgess, has done some recent work on other
> problems with detach. Upon hearing of this problem, he came up a test
> case which reliably reproduces the problem and tests for a few other
> problems as well. In addition to testing detach when the inferior has
> terminated due to a signal, it also tests detach when the inferior has
> exited normally. Andrew observed that the linux-native-only
> "checkpoint" command would be affected too, so the test also tests
> those cases when there's an active checkpoint.
>
> For the LWP exit / termination case with no checkpoint, that's handled
> via newly added checks of the waitstatus in detach_one_lwp in
> linux-nat.c.
>
> For the checkpoint detach problem, I chose to pass the lwp_info
> to linux_fork_detach in linux-fork.c. With that in place, suitable
> tests were added before attempting a PTRACE_DETACH operation.
>
> I added a few asserts at the beginning of linux_fork_detach and
> modified the caller code so that the newly added asserts shouldn't
> trigger. (That's what the 'pid == inferior_ptid.pid' check is about
> in gdb/linux-nat.c.)
>
> Finally, the #include for linux-fork.h had to be moved after that
> of linux-nat.c so that the type of lwp_info would be known.
>
> Lastly, I'll note that the checkpoint code needs some work with
> regard to background execution. This patch doesn't attempt to
> fix that problem, but it doesn't make it any worse. It does
> slightly improve the situation with detach because, due to the
> check noted above, linux_fork_detach() won't be called for the
> wrong inferior when there are multiple inferiors. (I haven't
> checked closely, but I don't think that the rest of the checkpoint
> code works correctly when there are multiple inferiors.)
> ---
> gdb/linux-fork.c | 22 ++++++++++++++++------
> gdb/linux-fork.h | 2 +-
> gdb/linux-nat.c | 17 +++++++++++++++--
> 3 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c
> index 52e385411c7..73d086938bd 100644
> --- a/gdb/linux-fork.c
> +++ b/gdb/linux-fork.c
> @@ -25,13 +25,14 @@
> #include "gdbcmd.h"
> #include "infcall.h"
> #include "objfiles.h"
> -#include "linux-fork.h"
> #include "linux-nat.h"
> +#include "linux-fork.h"
This change is only needed so that linux-fork.h sees the declaration of
lwp_info from linux-nat.h.
I'm firmly of the opinion that a header file should pull in all of its
dependencies, rather than relying on the #include order of its users.
So, instead of this change, I think you should really just add:
struct lwp_info;
to linux-fork.h.
> #include "gdbthread.h"
> #include "source.h"
>
> #include "nat/gdb_ptrace.h"
> #include "gdbsupport/gdb_wait.h"
> +#include "target/waitstatus.h"
> #include <dirent.h>
> #include <ctype.h>
>
> @@ -361,15 +362,24 @@ linux_fork_mourn_inferior (void)
> the first available. */
>
> void
> -linux_fork_detach (int from_tty)
> +linux_fork_detach (int from_tty, lwp_info *lp)
> {
> + gdb_assert (lp != nullptr);
> + gdb_assert (lp->ptid == inferior_ptid);
> +
> /* OK, inferior_ptid is the one we are detaching from. We need to
> delete it from the fork_list, and switch to the next available
> - fork. */
> + fork. But before doing the detach, do make sure that the lwp
> + hasn't exited or been terminated first. */
>
> - if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0))
> - error (_("Unable to detach %s"),
> - target_pid_to_str (inferior_ptid).c_str ());
> + if (lp->waitstatus.kind () != TARGET_WAITKIND_EXITED
> + && lp->waitstatus.kind () != TARGET_WAITKIND_THREAD_EXITED
> + && lp->waitstatus.kind () != TARGET_WAITKIND_SIGNALLED)
> + {
> + if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0))
> + error (_("Unable to detach %s"),
> + target_pid_to_str (inferior_ptid).c_str ());
> + }
>
> delete_fork (inferior_ptid);
>
> diff --git a/gdb/linux-fork.h b/gdb/linux-fork.h
> index 5a593fca91e..6e8f5ab7d0b 100644
> --- a/gdb/linux-fork.h
> +++ b/gdb/linux-fork.h
> @@ -25,7 +25,7 @@ extern void add_fork (pid_t);
> extern struct fork_info *find_fork_pid (pid_t);
> extern void linux_fork_killall (void);
> extern void linux_fork_mourn_inferior (void);
> -extern void linux_fork_detach (int);
> +extern void linux_fork_detach (int, lwp_info *);
> extern int forks_exist_p (void);
> extern int linux_fork_checkpointing_p (int);
>
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 1c9756c18bd..bf2da572de0 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1354,6 +1354,19 @@ detach_one_lwp (struct lwp_info *lp, int *signo_p)
> lp->signalled = 0;
> }
>
> + /* If the lwp has exited or was terminated due to a signal, there's
> + nothing left to do. */
> + if (lp->waitstatus.kind () == TARGET_WAITKIND_EXITED
> + || lp->waitstatus.kind () == TARGET_WAITKIND_THREAD_EXITED
> + || lp->waitstatus.kind () == TARGET_WAITKIND_SIGNALLED)
> + {
> + linux_nat_debug_printf
> + ("Can't detach %s - it has exited or was terminated: %s.",
This line should be indented with a leading <tab>. The
binutils-gdb/.gitattributes file should result in this issue being
highlighted if you 'git show' this patch.
> + lp->ptid.to_string ().c_str (),
> + lp->waitstatus.to_string ().c_str ());
> + return;
I wonder if we should be calling `delete_lwp (lp->ptid);` before calling
return here?
I suspect that, at least in the test case you have, that this doesn't
really matter, we'll likely end up calling purge_lwp_list via
linux_nat_target::mourn_inferior ... but it still feels like we should
be making the call.
> + }
> +
> if (signo_p == NULL)
> {
> /* Pass on any pending signal for this LWP. */
> @@ -1427,13 +1440,13 @@ linux_nat_target::detach (inferior *inf, int from_tty)
> gdb_assert (num_lwps (pid) == 1
> || (target_is_non_stop_p () && num_lwps (pid) == 0));
>
> - if (forks_exist_p ())
> + if (pid == inferior_ptid.pid () && forks_exist_p ())
I see how this ties to the assert in linux_fork_detach. And given the
comments and use of inferior_ptid, I can see why you added the assert.
But I guess you added the check because some test triggered the assert.
Do you recall which test that was? I guess I'm curious what PID we are
trying to detach from when this condition triggers.
Thanks,
Andrew
> {
> /* Multi-fork case. The current inferior_ptid is being detached
> from, but there are other viable forks to debug. Detach from
> the current fork, and context-switch to the first
> available. */
> - linux_fork_detach (from_tty);
> + linux_fork_detach (from_tty, find_lwp_pid (ptid_t (pid)));
> }
> else
> {
> --
> 2.41.0
next prev parent reply other threads:[~2023-11-13 10:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-11 22:26 [PATCH 0/2] " Kevin Buettner
2023-11-11 22:26 ` [PATCH 1/2] linux-nat.c, linux-fork.c: " Kevin Buettner
2023-11-13 10:58 ` Andrew Burgess [this message]
2023-11-14 1:36 ` Kevin Buettner
2023-11-14 5:48 ` Kevin Buettner
2023-11-11 22:26 ` [PATCH 2/2] New test: gdb.base/process-dies-while-detaching.exp Kevin Buettner
2023-11-13 11:04 ` Andrew Burgess
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87il65sywr.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=kevinb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).