From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by sourceware.org (Postfix) with ESMTPS id 7BC9538582B7 for ; Mon, 13 Nov 2023 14:04:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7BC9538582B7 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7BC9538582B7 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699884264; cv=none; b=VjHCkeE9dDZmtLkPs4iYhOI6Jig09GcQxF3Vfack3oDdBWoZ/xvGnSDDfFJoWB1Sw+65CaaSytIWc5VK3HRUKjBA0PD/Ctv57QSNMpfUIv927RJpMYcYwPaizEheavrleFcJmmvV+v/dT/ZqnwRXgIIfadbSaui+U1JpuTptohA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699884264; c=relaxed/simple; bh=K6HMvC3dN5o0+XhxJH0Ik9P8iWCu08I/8GDVnMYKU3E=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=fkHLxFzthofSogWGLlJEbLgzBaacNH9lEzmjvvSXJrA22tk/pEoiN1iKcpw0ywXlZjOKz87nSpqbTzEg4ewtSu+S0rHqvkmFl5aL74Suu31PPbj9gzAM6kFiC0o7ncUwzYuOXeEtq2xG+0a9a/4zp8D+NmBVHv9HfyG8qRMNPAs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-32d895584f1so2721404f8f.1 for ; Mon, 13 Nov 2023 06:04:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699884255; x=1700489055; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RBWOTkn11nPd6+CD8olp/qiAzhIqmvvapisDAxMDQWU=; b=WoFd+pUZUJh2JidO/upbBmh5msEjDI4Zb/bHApQ2lFPTD2iP/pCJOzI0AmrhXp6okx 1QG4BticoAFaNkxqVnlhAuET5GLuA9CLJmI3nRGsZlb5xTQJZWiVUVEBXeDlCg58oXcW hmo97zT1DKYM/KZtv3GPP55eZsjpA0bJr0bNFNbcBhKuaRmxyfNDaOSXuJ82Q+VIvz2e sajLdiSpsvOBfe7C/LAhfwT74+B5KsQaY2gAU6UlpzZDapAjrH3HKgIrlv2l5/IBS0EF 3AHlpZ7jbZmmN2GZJIjlEbv2T5kMnB9weIt3YatuiczldhsZO85s50wF4FEpPbALdKQ5 MrvQ== X-Gm-Message-State: AOJu0YzkKnNy6E61ZxzL2FDeixhjTRpU/M20AnETVkrSqbeAnQQZPgv9 1g8qZ/FJ071vuvpAyrv5rblRTx9FkqA= X-Google-Smtp-Source: AGHT+IG76Mf1d3z2t2oCq1fGmimPB2GGDH002QqWzB2GPdbwcdqHZjaEiBUDme8ydqgRmPINn6VYjQ== X-Received: by 2002:a05:6000:178f:b0:32d:84c7:2f56 with SMTP id e15-20020a056000178f00b0032d84c72f56mr4900108wrg.21.1699884254926; Mon, 13 Nov 2023 06:04:14 -0800 (PST) Received: from ?IPV6:2001:8a0:f91e:1a00:8060:1e54:fb28:9635? ([2001:8a0:f91e:1a00:8060:1e54:fb28:9635]) by smtp.gmail.com with ESMTPSA id w7-20020adff9c7000000b0032fbd0c7d04sm5525593wrr.55.2023.11.13.06.04.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Nov 2023 06:04:14 -0800 (PST) Message-ID: <51c04503-ec38-4366-b02d-91da84b5ba3c@palves.net> Date: Mon, 13 Nov 2023 14:04:02 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Content-Language: en-US To: Andrew Burgess , gdb-patches@sourceware.org References: <20221212203101.1034916-1-pedro@palves.net> <20221212203101.1034916-4-pedro@palves.net> <87ileucg5f.fsf@redhat.com> <7346b585-adb2-743e-fdaf-213fc595f93b@palves.net> <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net> <87pm6n5e20.fsf@redhat.com> From: Pedro Alves In-Reply-To: <87pm6n5e20.fsf@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! At long last, I've resumed work on this again. A belated thank you so much for the reviews. On 2023-05-26 16:04, Andrew Burgess wrote: > Pedro Alves writes: >> + /* Display one table row for each lwp_info. */ >> + for (lwp_info *lp : all_lwps ()) >> + { >> + ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry"); >> + >> + struct thread_info *th = find_thread_ptid (linux_target, lp->ptid); > > After recent changes this line becomes: > > struct thread_info *th = linux_target->find_thread (lp->ptid); Thanks, done. >> From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001 >> From: Pedro Alves >> Date: Wed, 13 Jul 2022 17:16:38 +0100 >> Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace >> exec event >> >> I noticed that on an Ubuntu 20.04 system, after a following patch >> ("Step over clone syscall w/ breakpoint, >> TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp >> was passing cleanly, but still, we'd end up with four new unexpected >> GDB core dumps: >> >> === gdb Summary === >> >> # of unexpected core files 4 >> # of expected passes 48 >> >> That said patch is making the pre-existing >> gdb.threads/step-over-exec.exp testcase (almost silently) expose a >> latent problem in gdb/linux-nat.c, resulting in a GDB crash when: >> >> #1 - a non-leader thread execs >> #2 - the post-exec program stops somewhere >> #3 - you kill the inferior >> >> Instead of #3 directly, the testcase just returns, which ends up in >> gdb_exit, tearing down GDB, which kills the inferior, and is thus >> equivalent to #3 above. >> >> Vis: >> >> $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true >> ... >> (top-gdb) r >> ... >> (gdb) b main >> ... >> (gdb) r >> ... >> Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 >> 69 argv0 = argv[0]; >> (gdb) c >> Continuing. >> [New Thread 0x7ffff7d89700 (LWP 2506975)] >> Other going in exec. >> Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd >> process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd >> >> Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 >> 28 foo (); >> (gdb) k >> ... >> Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. >> 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 >> 393 return m_suspend.waitstatus_pending_p; >> (top-gdb) bt >> #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 >> #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 >> #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 >> #3 0x0000555555a92a26 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 >> #4 0x0000555555a92a51 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 >> #5 0x0000555555a91f84 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 >> #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 >> #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 ) at ../../src/gdb/linux-nat.c:3590 >> #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 >> ... > > As I mentioned in my other message, this backtrace includes > kill_unfollowed_child_callback, which doesn't exist yet! I think that's > OK though, the text before the backtrace does make it clear that you saw > this problem only after applying a later patch. I've tweaked the commit log to make that more explicit. > >> >> The root of the problem is that when a non-leader LWP execs, it just >> changes its tid to the tgid, replacing the pre-exec leader thread, >> becoming the new leader. There's no thread exit event for the execing >> thread. It's as if the old pre-exec LWP vanishes without trace. The >> ptrace man page says: >> >> "PTRACE_O_TRACEEXEC (since Linux 2.5.46) >> Stop the tracee at the next execve(2). A waitpid(2) by the >> tracer will return a status value such that >> >> status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) >> >> If the execing thread is not a thread group leader, the thread >> ID is reset to thread group leader's ID before this stop. >> Since Linux 3.0, the former thread ID can be retrieved with >> PTRACE_GETEVENTMSG." >> >> When the core of GDB processes an exec events, it deletes all the >> threads of the inferior. But, that is too late -- deleting the thread >> does not delete the corresponding LWP, so we end leaving the pre-exec >> non-leader LWP stale in the LWP list. That's what leads to the crash >> above -- linux_nat_target::kill iterates over all LWPs, and after the >> patch in question, that code will look for the corresponding >> thread_info for each LWP. For the pre-exec non-leader LWP still >> listed, won't find one. >> >> This patch fixes it, by deleting the pre-exec non-leader LWP (and >> thread) from the LWP/thread lists as soon as we get an exec event out >> of ptrace. >> >> GDBserver does not need an equivalent fix, because it is already doing >> this, as side effect of mourning the pre-exec process, in >> gdbserver/linux-low.cc: >> >> else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) >> { >> ... >> /* Delete the execing process and all its threads. */ >> mourn (proc); >> switch_to_thread (nullptr); >> >> >> The crash with gdb.threads/step-over-exec.exp is not observable on >> newer systems, which postdate the glibc change to move "libpthread.so" >> internals to "libc.so.6", because right after the exec, GDB traps a >> load event for "libc.so.6", which leads to GDB trying to open >> libthread_db for the post-exec inferior, and, on such systems that >> succeeds. When we load libthread_db, we call >> linux_stop_and_wait_all_lwps, which, as the name suggests, stops all >> lwps, and then waits to see their stops. While doing this, GDB >> detects that the pre-exec stale LWP is gone, and deletes it. >> >> If we use "catch exec" to stop right at the exec before the >> "libc.so.6" load event ever happens, and issue "kill" right there, >> then GDB crashes on newer systems as well. So instead of tweaking >> gdb.threads/step-over-exec.exp to cover the fix, add a new >> gdb.threads/threads-after-exec.exp testcase that uses "catch exec". > > Maybe it's worth mentioning that because the crash itself only happens > once a later patch is applied we use 'maint info linux-lwps' to reveal > the issue for now? I've done something like that. >> >> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643 >> --- >> gdb/infrun.c | 8 +-- >> gdb/linux-nat.c | 15 ++++ >> .../gdb.threads/threads-after-exec.exp | 70 +++++++++++++++++++ > > Oops, this diff is missing the two source files for this test (.c and > -execd.c). I was able to figure something out though so I could test > the rest of this patch :) > Ouch. And enough time has passed that I completely lost those files. But as you discovered, they're trivial enough to rewrite. Actually, I simplified, and only use one .c file this time. >> 3 files changed, 88 insertions(+), 5 deletions(-) >> create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp >> ... >> + >> +standard_testfile .c -execd.c >> + >> +proc do_test { } { >> + global srcdir subdir srcfile srcfile2 binfile testfile >> + global decimal >> + >> + # Compile main binary (the one that does the exec). >> + if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \ >> + executable {debug}] != "" } { >> + return -1 >> + } > > You can do: > > if {[build_executable "failed to build main executable" \ > $binfile $srcfile {debug pthread}] == -1} { > return -1 > } > >> + >> + # Compile the second binary (the one that gets exec'd). >> + if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \ >> + executable {debug}] != "" } { >> + return -1 >> + } > > And: > > if {[build_executable "failed to build execd executable" \ > $binfile-execd $srcfile2 {debug}] == -1} { > return -1 > } > > I thought we were moving away from calling the gdb_compile* functions > directly. > Done-ish -- since I only have one test program this time, I can use prepare_for_testing, instead. > Assuming the missing source files are added, this all looks great. > > Reviewed-By: Andrew Burgess