From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id E5B0D3858C52 for ; Mon, 13 Nov 2023 10:58:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5B0D3858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E5B0D3858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699873100; cv=none; b=QHapAjmblHuQgh9idGMF3UELPBsIh0nkheQV9VlZhQwigsfP38zy8x2mtlS57s65+JWFtW9Np8ascccmC/KWy8DVSeuFL8iDf9KIEsLMpuyl9IcS9fjzTDNWIEwF8Eagv/JfdBXveDIJmx2nzeggmA571boJWO5++MLxg5zhoPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699873100; c=relaxed/simple; bh=+UYv5carbIWncdtOi5w8M6V94JhGsgEmZmxHxf7swl0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=vw5ZANDNmjtzNp1WEPHQRWYmzbEfA2+HifLGbgvaK9kIWjqWqkFZAVM6goO8ZlfFrLTDfaRZQGArcyND1fu6Ux2jK3k9gs+FCyS4kNOB5MwmM9oCSopmPcqsXkIvW9KaRjFEDm7D+awLL+IaKbmHOYKhXYXqNj7k7vOBXgMKtuQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699873097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DXC7Tavwrt6AJ/hZ3haKmtP/jHrWVyCYMOfLqpDvFRE=; b=EklvbdGE3faT8Kede7QtJ0jzQ9qdCLlx/4r37k4iR5AAqFNPfjqrtyFPJYVX084aWbZstP i27OX3BC48Di2RmCKYeS6hpWEu7b9Ubno6EVVvbFg5ZtxfCmZQshr3FC6nMviWlRINnPlC BY5ogot4qupKKQt5OwfXnWJSL8IFNFo= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-650-0RuklbqtNVq41w-ML6sSiQ-1; Mon, 13 Nov 2023 05:58:16 -0500 X-MC-Unique: 0RuklbqtNVq41w-ML6sSiQ-1 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-2c520e0a9a7so38635801fa.3 for ; Mon, 13 Nov 2023 02:58:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699873094; x=1700477894; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DXC7Tavwrt6AJ/hZ3haKmtP/jHrWVyCYMOfLqpDvFRE=; b=YyA14MpgKT0WEh9VALccVkPbi1gphHznS4j65OhC9bgRe8CYvF7C5+qxm4AqCy7aTb ZEKUSepFjQgS1M1kt1O2jgkvPJJc3J7A8lorkOR49cwASqTGRamTgEDxyA5jmrvX3EtB AgRbxcOFIy+xBMJ+6thDqhE6+GAFCxnLDRRs1/ZbK31B2J+k1RZzKQSo7m5SZHJXpLpo 5JBsCkQ1IXmhmQ+LiNUrOByVp+CIg1HCUz8sGArBERrWgglFuUcaXN2w7/BVOux47MBv D5bOG4mGBAYXn/ihtWtBDsg3FeOj3ofwxXGS9C3//i76cpQrh1s3wBgPN0iel6BVJM2g XhjA== X-Gm-Message-State: AOJu0YxY8hvktvRM3vC6VjeZma1tKNA8Vsg/CEQ3+ElMydzeOpRAprjy 7si9aNd+Quxs9LERfINnMDGDlXSbi3I83viS98ONlQJbm05PAJAGJgw1Lm8JZOAoO11espXzglO +fa8t2Y2R9YJDnKommzbcJ32FuXsMxA== X-Received: by 2002:a2e:a592:0:b0:2c8:3b99:7f08 with SMTP id m18-20020a2ea592000000b002c83b997f08mr3885248ljp.0.1699873094263; Mon, 13 Nov 2023 02:58:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IEnQS5S7BbEth7AhJHC4iMlXlS5KDjBwLM8EXqIzL1uigjP+fCNy8W1Y3yS0YdjNifXJ4uWiw== X-Received: by 2002:a2e:a592:0:b0:2c8:3b99:7f08 with SMTP id m18-20020a2ea592000000b002c83b997f08mr3885232ljp.0.1699873093754; Mon, 13 Nov 2023 02:58:13 -0800 (PST) Received: from localhost (105.226.159.143.dyn.plus.net. [143.159.226.105]) by smtp.gmail.com with ESMTPSA id fm6-20020a05600c0c0600b004063d8b43e7sm13559445wmb.48.2023.11.13.02.58.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 02:58:13 -0800 (PST) From: Andrew Burgess To: Kevin Buettner , gdb-patches@sourceware.org Cc: Kevin Buettner Subject: Re: [PATCH 1/2] linux-nat.c, linux-fork.c: Fix detach bug when lwp has exited/terminated In-Reply-To: <20231111223046.109727-2-kevinb@redhat.com> References: <20231111223046.109727-1-kevinb@redhat.com> <20231111223046.109727-2-kevinb@redhat.com> Date: Mon, 13 Nov 2023 10:58:12 +0000 Message-ID: <87il65sywr.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Kevin Buettner writes: > When using GDB on native linux, it can happen that, while attempting > to detach an inferior, the inferior may have been exited or have been > killed, yet still be in the list of lwps. Should that happen, the > assert in x86_linux_update_debug_registers in > gdb/nat/x86-linux-dregs.c will trigger. The line in question looks > like this: > > gdb_assert (lwp_is_stopped (lwp)); > > For this case, the lwp isn't stopped - it's dead. > > The bug which brought this problem to my attention is one in which the > pwntools library uses GDB to to debug a process; as the script is > shutting things down, it kills the process that GDB is debugging and > also sends GDB a SIGTERM signal, which causes GDB to detach all > inferiors prior to exiting. Here's a link to the bug: > > https://bugzilla.redhat.com/show_bug.cgi?id=2192169 > > The following shell command mimics part of what the pwntools > reproducer script does (with regard to shutting things down), but > reproduces the bug much less reliably. I have found it necessary to > run the command a bunch of times before seeing the bug. (I usually > see it within 5-10 repetitions.) If you choose to try this command, > make sure that you have no running "cat" or "gdb" processes first! > > cat /dev/null & \ > (sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \ > sleep 1 ; \ > gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \ > -ex c /usr/bin/cat `pgrep cat` > > So, basically, the idea here is to kill both gdb and cat at roughly > the same time. If we happen to attempt the detach before the process > lwp has been deleted from GDB's (linux native) LWP data structures, > then the assert will trigger. The relevant part of the backtrace > looks like this: > > #8 0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280) > at gdb/nat/x86-linux-dregs.c:146 > #9 0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280) > at gdb/nat/x86-linux.c:81 > #10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume ( > this=0x121eee0 , lwp=0x1873280) > at gdb/x86-linux-nat.h:70 > #11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c) > at gdb/linux-nat.c:1374 > #12 0x000000000081a85f in linux_nat_target::detach ( > this=0x121eee0 , inf=0x16e8f70, from_tty=0) > at gdb/linux-nat.c:1450 > #13 0x000000000083a23b in thread_db_target::detach ( > this=0x1206ae0 , inf=0x16e8f70, from_tty=0) > at gdb/linux-thread-db.c:1385 > #14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0) > at gdb/target.c:2526 > #15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0) > at gdb/top.c:1659 > #16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0) > at gdb/top.c:1762 > #17 0x000000000070829c in async_sigterm_handler (arg=0x0) > at gdb/event-top.c:1141 > > My colleague, Andrew Burgess, has done some recent work on other > problems with detach. Upon hearing of this problem, he came up a test > case which reliably reproduces the problem and tests for a few other > problems as well. In addition to testing detach when the inferior has > terminated due to a signal, it also tests detach when the inferior has > exited normally. Andrew observed that the linux-native-only > "checkpoint" command would be affected too, so the test also tests > those cases when there's an active checkpoint. > > For the LWP exit / termination case with no checkpoint, that's handled > via newly added checks of the waitstatus in detach_one_lwp in > linux-nat.c. > > For the checkpoint detach problem, I chose to pass the lwp_info > to linux_fork_detach in linux-fork.c. With that in place, suitable > tests were added before attempting a PTRACE_DETACH operation. > > I added a few asserts at the beginning of linux_fork_detach and > modified the caller code so that the newly added asserts shouldn't > trigger. (That's what the 'pid == inferior_ptid.pid' check is about > in gdb/linux-nat.c.) > > Finally, the #include for linux-fork.h had to be moved after that > of linux-nat.c so that the type of lwp_info would be known. > > Lastly, I'll note that the checkpoint code needs some work with > regard to background execution. This patch doesn't attempt to > fix that problem, but it doesn't make it any worse. It does > slightly improve the situation with detach because, due to the > check noted above, linux_fork_detach() won't be called for the > wrong inferior when there are multiple inferiors. (I haven't > checked closely, but I don't think that the rest of the checkpoint > code works correctly when there are multiple inferiors.) > --- > gdb/linux-fork.c | 22 ++++++++++++++++------ > gdb/linux-fork.h | 2 +- > gdb/linux-nat.c | 17 +++++++++++++++-- > 3 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c > index 52e385411c7..73d086938bd 100644 > --- a/gdb/linux-fork.c > +++ b/gdb/linux-fork.c > @@ -25,13 +25,14 @@ > #include "gdbcmd.h" > #include "infcall.h" > #include "objfiles.h" > -#include "linux-fork.h" > #include "linux-nat.h" > +#include "linux-fork.h" This change is only needed so that linux-fork.h sees the declaration of lwp_info from linux-nat.h. I'm firmly of the opinion that a header file should pull in all of its dependencies, rather than relying on the #include order of its users. So, instead of this change, I think you should really just add: struct lwp_info; to linux-fork.h. > #include "gdbthread.h" > #include "source.h" > > #include "nat/gdb_ptrace.h" > #include "gdbsupport/gdb_wait.h" > +#include "target/waitstatus.h" > #include > #include > > @@ -361,15 +362,24 @@ linux_fork_mourn_inferior (void) > the first available. */ > > void > -linux_fork_detach (int from_tty) > +linux_fork_detach (int from_tty, lwp_info *lp) > { > + gdb_assert (lp != nullptr); > + gdb_assert (lp->ptid == inferior_ptid); > + > /* OK, inferior_ptid is the one we are detaching from. We need to > delete it from the fork_list, and switch to the next available > - fork. */ > + fork. But before doing the detach, do make sure that the lwp > + hasn't exited or been terminated first. */ > > - if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0)) > - error (_("Unable to detach %s"), > - target_pid_to_str (inferior_ptid).c_str ()); > + if (lp->waitstatus.kind () != TARGET_WAITKIND_EXITED > + && lp->waitstatus.kind () != TARGET_WAITKIND_THREAD_EXITED > + && lp->waitstatus.kind () != TARGET_WAITKIND_SIGNALLED) > + { > + if (ptrace (PTRACE_DETACH, inferior_ptid.pid (), 0, 0)) > + error (_("Unable to detach %s"), > + target_pid_to_str (inferior_ptid).c_str ()); > + } > > delete_fork (inferior_ptid); > > diff --git a/gdb/linux-fork.h b/gdb/linux-fork.h > index 5a593fca91e..6e8f5ab7d0b 100644 > --- a/gdb/linux-fork.h > +++ b/gdb/linux-fork.h > @@ -25,7 +25,7 @@ extern void add_fork (pid_t); > extern struct fork_info *find_fork_pid (pid_t); > extern void linux_fork_killall (void); > extern void linux_fork_mourn_inferior (void); > -extern void linux_fork_detach (int); > +extern void linux_fork_detach (int, lwp_info *); > extern int forks_exist_p (void); > extern int linux_fork_checkpointing_p (int); > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 1c9756c18bd..bf2da572de0 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -1354,6 +1354,19 @@ detach_one_lwp (struct lwp_info *lp, int *signo_p) > lp->signalled = 0; > } > > + /* If the lwp has exited or was terminated due to a signal, there's > + nothing left to do. */ > + if (lp->waitstatus.kind () == TARGET_WAITKIND_EXITED > + || lp->waitstatus.kind () == TARGET_WAITKIND_THREAD_EXITED > + || lp->waitstatus.kind () == TARGET_WAITKIND_SIGNALLED) > + { > + linux_nat_debug_printf > + ("Can't detach %s - it has exited or was terminated: %s.", This line should be indented with a leading . The binutils-gdb/.gitattributes file should result in this issue being highlighted if you 'git show' this patch. > + lp->ptid.to_string ().c_str (), > + lp->waitstatus.to_string ().c_str ()); > + return; I wonder if we should be calling `delete_lwp (lp->ptid);` before calling return here? I suspect that, at least in the test case you have, that this doesn't really matter, we'll likely end up calling purge_lwp_list via linux_nat_target::mourn_inferior ... but it still feels like we should be making the call. > + } > + > if (signo_p == NULL) > { > /* Pass on any pending signal for this LWP. */ > @@ -1427,13 +1440,13 @@ linux_nat_target::detach (inferior *inf, int from_tty) > gdb_assert (num_lwps (pid) == 1 > || (target_is_non_stop_p () && num_lwps (pid) == 0)); > > - if (forks_exist_p ()) > + if (pid == inferior_ptid.pid () && forks_exist_p ()) I see how this ties to the assert in linux_fork_detach. And given the comments and use of inferior_ptid, I can see why you added the assert. But I guess you added the check because some test triggered the assert. Do you recall which test that was? I guess I'm curious what PID we are trying to detach from when this condition triggers. Thanks, Andrew > { > /* Multi-fork case. The current inferior_ptid is being detached > from, but there are other viable forks to debug. Detach from > the current fork, and context-switch to the first > available. */ > - linux_fork_detach (from_tty); > + linux_fork_detach (from_tty, find_lwp_pid (ptid_t (pid))); > } > else > { > -- > 2.41.0