From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 08B7C3858D1E for ; Thu, 21 Jul 2022 00:45:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 08B7C3858D1E Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id B88871E13B; Wed, 20 Jul 2022 20:45:45 -0400 (EDT) Message-ID: <0d63c994-4146-351d-f0c4-a57c42b8a8a5@simark.ca> Date: Wed, 20 Jul 2022 20:45:45 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2 03/29] gdb/linux: Delete all other LWPs immediately on ptrace exec event Content-Language: en-US To: Pedro Alves , gdb-patches@sourceware.org References: <20220713222433.374898-1-pedro@palves.net> <20220713222433.374898-4-pedro@palves.net> From: Simon Marchi In-Reply-To: <20220713222433.374898-4-pedro@palves.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2022 00:45:48 -0000 On 2022-07-13 18:24, Pedro Alves wrote: > I noticed that after a following patch ("Step over clone syscall w/ > breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the > gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd > end up with four new unexpected GDB core dumps: > > === gdb Summary === > > # of unexpected core files 4 > # of expected passes 48 > > That said patch is making the pre-existing > gdb.threads/step-over-exec.exp testcase (almost silently) expose a > latent problem in gdb/linux-nat.c, resulting in a GDB crash when: > > #1 - a non-leader thread execs > #2 - the post-exec program stops somewhere > #3 - you kill the inferior > > Instead of #3 directly, the testcase just returns, which ends up in > gdb_exit, tearing down GDB, which kills the inferior, and is thus > equivalent to #3 above. > > Vis: > > $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true > ... > (top-gdb) r > ... > (gdb) b main > ... > (gdb) r > ... > Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 > 69 argv0 = argv[0]; > (gdb) c > Continuing. > [New Thread 0x7ffff7d89700 (LWP 2506975)] > Other going in exec. > Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > > Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 > 28 foo (); > (gdb) k > ... > Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. > 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > 393 return m_suspend.waitstatus_pending_p; > (top-gdb) bt > #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 > #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 > #3 0x0000555555a92a26 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 > #4 0x0000555555a92a51 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 > #5 0x0000555555a91f84 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 > #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 > #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 ) at ../../src/gdb/linux-nat.c:3590 > #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 > ... > > The root of the problem is that when a non-leader LWP execs, it just > changes its tid to the tgid, replacing the pre-exec leader thread, > becoming the new leader. There's no thread exit event for the execing > thread. It's as if the old pre-exec LWP vanishes without trace. The > ptrace man page says: > > "PTRACE_O_TRACEEXEC (since Linux 2.5.46) > Stop the tracee at the next execve(2). A waitpid(2) by the > tracer will return a status value such that > > status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) > > If the execing thread is not a thread group leader, the thread > ID is reset to thread group leader's ID before this stop. > Since Linux 3.0, the former thread ID can be retrieved with > PTRACE_GETEVENTMSG." > > When the core of GDB processes an exec events, it deletes all the > threads of the inferior. But, that is too late -- deleting the thread > does not delete the corresponding LWP, so we end leaving the pre-exec > non-leader LWP stale in the LWP list. That's what leads to the crash > above -- linux_nat_target::kill iterates over all LWPs, and after the > patch in question, that code will look for the corresponding > thread_info for each LWP. For the pre-exec non-leader LWP still > listed, won't find one. > > This patch fixes it, by deleting the pre-exec non-leader LWP (and > thread) from the LWP/thread lists as seen as we get an exec event out seen -> soon Otherwise LGTM. Simon