From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by sourceware.org (Postfix) with ESMTPS id C6AA9385AC34 for ; Wed, 13 Jul 2022 22:24:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C6AA9385AC34 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f42.google.com with SMTP id b26so45892wrc.2 for ; Wed, 13 Jul 2022 15:24:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NY1iKK8YwbInM9GyTDfr4b9/Edt333iHUZoFaJBL0Wo=; b=dHoLupprsMUmhMmZEeGsLkLzUGdZ5AHagHG1bolqF29PFYKeogXPzR6GiKJ/eApg32 Fg95YT5fwMm7LIrDhSADCiMzU67cqVHq9w1AMsm+0h4AYJ4yPUzOZQbPMD5cOyiACXPa APIRr30/wZHYBZPfLqB/Eq5NLL2D1pMQNJq9Dibgk9ol60GwPjZ3j/+xOfXaTK6MfRXI 2jA5iWGaMJtDbP1vDjCdTAjXdXAXNeOTviwC5xZs7h06D+0fuDY2lpJsQvtuREvuGLMI 45HjCp6yyW05iSGAx1zi/A9hNWLugrGSkfG5ub5FlLG6sM9xtSZPGnQFVzPwJYonyEsP yVvg== X-Gm-Message-State: AJIora/1nTBpYUX919jnnmiiwknFP1hJdZwpC8Nh0XxK25dqlJ2KY4kj Uq6OSzRmWOTaduwp/yiNn9epEV6wqQk= X-Google-Smtp-Source: AGRyM1uyYSJUX9U1cTN1awlpCHMeFKivrX1UQ+7qRQ1q1l7oEpY7MLlERWzaiPY5WQiVvdJkaC2I3Q== X-Received: by 2002:a5d:4889:0:b0:21b:293e:9e43 with SMTP id g9-20020a5d4889000000b0021b293e9e43mr5089727wrq.705.1657751081965; Wed, 13 Jul 2022 15:24:41 -0700 (PDT) Received: from localhost ([2001:8a0:f924:2600:209d:85e2:409e:8726]) by smtp.gmail.com with ESMTPSA id s14-20020a5d424e000000b0021d4d6355efsm12027273wrr.109.2022.07.13.15.24.40 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Jul 2022 15:24:41 -0700 (PDT) From: Pedro Alves To: gdb-patches@sourceware.org Subject: [PATCH v2 03/29] gdb/linux: Delete all other LWPs immediately on ptrace exec event Date: Wed, 13 Jul 2022 23:24:07 +0100 Message-Id: <20220713222433.374898-4-pedro@palves.net> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220713222433.374898-1-pedro@palves.net> References: <20220713222433.374898-1-pedro@palves.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2022 22:24:45 -0000 I noticed that after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis: $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 ) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as seen as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643 --- gdb/linux-nat.c | 15 +++++++++++++++ gdb/testsuite/gdb.threads/step-over-exec.exp | 6 ++++++ 2 files changed, 21 insertions(+) diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index 57cab4ce50a..e27cc890ff5 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -1987,6 +1987,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status) thread execs, it changes its tid to the tgid, and the old tgid thread might have not been resumed. */ lp->resumed = 1; + + /* All other LWPs are gone now. We'll have received a thread + exit notification for all threads other the execing one. + That one, if it wasn't the leader, just silently changes its + tid to the tgid, and the previous leader vanishes. Since + Linux 3.0, the former thread ID can be retrieved with + PTRACE_GETEVENTMSG, but since we support older kernels, don't + bother with it, and just walk the LWP list. Even with + PTRACE_GETEVENTMSG, we'd still need to lookup the + corresponding LWP object, and it would be an extra ptrace + syscall, so this way may even be more efficient. */ + for (lwp_info *other_lp : all_lwps_safe ()) + if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ()) + exit_lwp (other_lp); + return 0; } diff --git a/gdb/testsuite/gdb.threads/step-over-exec.exp b/gdb/testsuite/gdb.threads/step-over-exec.exp index 783f865585c..a8b01f8aeda 100644 --- a/gdb/testsuite/gdb.threads/step-over-exec.exp +++ b/gdb/testsuite/gdb.threads/step-over-exec.exp @@ -102,6 +102,12 @@ proc do_test { execr_thread different_text_segments displaced_stepping } { gdb_breakpoint foo gdb_test "continue" "Breakpoint $decimal, foo .*" \ "continue to foo" + + # Test that GDB is able to kill the inferior. This may fail if + # e.g., GDB does not dispose of the pre-exec threads properly. + gdb_test "with confirm off -- kill" \ + "\\\[Inferior 1 (.*) killed\\\]" \ + "kill inferior" } foreach_with_prefix displaced_stepping {auto off} { -- 2.36.0