From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0A22C3858430 for ; Fri, 26 May 2023 15:05:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A22C3858430 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685113501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=quqAYunR7BfFMjWzjGStViAd706lJvb0p56BAejq6b4=; b=ITXqdq6rFXTTSmeVBq9HvuNiQxCpttla46tUlUABTh0ULOKdxbvDke6bpQH+hU5vDOcmlg 7w6FhsJIjZSFNv03Y+krM2Hdrd+pFlai+FZgeLt10k/xAkiW/7nmtuSJSegXLixV/15R2W 4u3JYmzQ0J9RKaEVVDeGNY0jqYW/o8M= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-613-tYlwZ5a2O3iFE4B62BamwA-1; Fri, 26 May 2023 11:04:59 -0400 X-MC-Unique: tYlwZ5a2O3iFE4B62BamwA-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f60536450eso5777645e9.1 for ; Fri, 26 May 2023 08:04:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685113498; x=1687705498; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KQ9O+aVuA9QsbONOtoOe/x3mT/NX3zL4Rw0fhWUvNpo=; b=Yhzxxpk4M+UqNPtUhLERZn9+JfpEhwWZ6N059SFdKQlmrhJ/aqT58IxIrfRTAidWvt b4+rQEipVhwokf9GWh3vstQE9GxTvoNbuGQqubfo1M4tXIc0G6Yaz+n8ZRUzF496WmUP HCoq3EAmNlXl6EbcfFzg2ls5f6JVwXAWaEdVvUdABsjtCLQIFTQfJdJOtdbdt6IWKUP2 zeIOxlQFHpmiiYmsvjle6xZjIgadce+WoWVTdU96Xdu+jEi3+BAauo8TETSBROyYJls6 V2YQxbpQ3ozwWw6gE1VfXZAYtkR16ahiokuktG1y58hIRV2fqRgnh+PwjVnNWyjADUns SPRg== X-Gm-Message-State: AC+VfDyvHyHLy/n6V29g2BRoPnVhJ77K6I7L/xJFmV6rjlVxLTAyTw8t RRVyF9WXpOrid09GDBJqsAewpntI4epK8Xed0x7VZXo5J4FE0MGJIq+7RUz5+h21ft09AML2vA6 kMyfIBk4fsrjXn/k1brN5X8TsiElaaA== X-Received: by 2002:a05:600c:2304:b0:3f6:e42:8f9b with SMTP id 4-20020a05600c230400b003f60e428f9bmr1894431wmo.27.1685113498362; Fri, 26 May 2023 08:04:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5p2nBfwRpgjKP9qM1/aF2otH+QCQbxGLhNFOh+/w3x8JCB1qF5Gy2FmopFcyC3PinwM60quQ== X-Received: by 2002:a05:600c:2304:b0:3f6:e42:8f9b with SMTP id 4-20020a05600c230400b003f60e428f9bmr1894397wmo.27.1685113497861; Fri, 26 May 2023 08:04:57 -0700 (PDT) Received: from localhost (11.72.115.87.dyn.plus.net. [87.115.72.11]) by smtp.gmail.com with ESMTPSA id y12-20020a1c4b0c000000b003f4290720d0sm8960972wma.47.2023.05.26.08.04.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 08:04:57 -0700 (PDT) From: Andrew Burgess To: Pedro Alves , gdb-patches@sourceware.org Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event In-Reply-To: <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net> References: <20221212203101.1034916-1-pedro@palves.net> <20221212203101.1034916-4-pedro@palves.net> <87ileucg5f.fsf@redhat.com> <7346b585-adb2-743e-fdaf-213fc595f93b@palves.net> <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net> Date: Fri, 26 May 2023 16:04:55 +0100 Message-ID: <87pm6n5e20.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Pedro Alves writes: > Hi! > > On 2023-04-04 2:57 p.m., Pedro Alves wrote: >> On 2023-03-21 2:50 p.m., Andrew Burgess wrote: >>> >>> I thought it was the second case, but I was so unsure that I tried the >>> reproducer anyway. Just in case I'm wrong, the above example doesn't >>> seem to fail prior to this commit. >> >> This surprised me, and when I tried it myself, I was even more surprised, >> for I couldn't reproduce it either! >> >> But I figured it out. >> >> I'm usually using Ubuntu 22.04 for development nowadays, and in that system, indeed I can't >> reproduce it. Right after the exec, GDB traps a load event for "libc.so.6", which leads to >> gdb trying to open libthread_db for the post-exec inferior, and, it succeeds. When we load >> libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, >> and then waits to see their stops. While doing this, GDB detects that the pre-exec stale >> LWP is gone, and deletes it. >> >> The logs show: >> >> [linux-nat] linux_nat_wait_1: waitpid 1725529 received SIGTRAP - Trace/breakpoint trap (stopped) >> [linux-nat] save_stop_reason: 1725529.1725529.0 stopped by software breakpoint >> [linux-nat] linux_nat_wait_1: waitpid(-1, ...) returned 0, ERRNO-OK >> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725658.0, not stopped >> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725529.0, has pending status >> [linux-nat] linux_nat_wait_1: trap ptid is 1725529.1725529.0. >> [linux-nat] linux_nat_wait_1: exit >> [linux-nat] stop_callback: kill 1725529.1725658.0 **** >> [linux-nat] stop_callback: lwp kill -1 No such process >> [linux-nat] wait_lwp: 1725529.1725658.0 vanished. >> >> And the backtrace is: >> >> (top-gdb) bt >> #0 wait_lwp (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2069 >> #1 0x0000555555aa8fbf in stop_wait_callback (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2375 >> #2 0x0000555555ab12b3 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (__closure=0x0, ecall=..., args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:326 >> #3 0x0000555555ab12e2 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:320 >> #4 0x0000555555ab0610 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffca90, args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:289 >> #5 0x0000555555aa4c2d in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:867 >> #6 0x0000555555aa8a03 in linux_stop_and_wait_all_lwps () at ../../src/gdb/linux-nat.c:2229 >> #7 0x0000555555ac8525 in try_thread_db_load_1 (info=0x555556a66dd0) at ../../src/gdb/linux-thread-db.c:923 >> #8 0x0000555555ac89d5 in try_thread_db_load (library=0x5555560eca27 "libthread_db.so.1", check_auto_load_safe=false) at ../../src/gdb/linux-thread-db.c:1024 >> #9 0x0000555555ac8eda in try_thread_db_load_from_sdir () at ../../src/gdb/linux-thread-db.c:1108 >> #10 0x0000555555ac9278 in thread_db_load_search () at ../../src/gdb/linux-thread-db.c:1163 >> #11 0x0000555555ac9518 in thread_db_load () at ../../src/gdb/linux-thread-db.c:1225 >> #12 0x0000555555ac95e1 in check_for_thread_db () at ../../src/gdb/linux-thread-db.c:1268 >> #13 0x0000555555ac9657 in thread_db_new_objfile (objfile=0x555556943ed0) at ../../src/gdb/linux-thread-db.c:1297 >> #14 0x000055555569e2d2 in std::__invoke_impl (__f=@0x5555567925d8: 0x555555ac95e8 ) at /usr/include/c++/11/bits/invoke.h:61 >> #15 0x000055555569c44a in std::__invoke_r (__fn=@0x5555567925d8: 0x555555ac95e8 ) at /usr/include/c++/11/bits/invoke.h:111 >> #16 0x0000555555699d69 in std::_Function_handler::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffce50: 0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:290 >> #17 0x0000555555b5f48b in std::function::operator()(objfile*) const (this=0x5555567925d8, __args#0=0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:590 >> #18 0x0000555555b5eba4 in gdb::observers::observable::notify (this=0x5555565b5680 , args#0=0x555556943ed0) at ../../src/gdb/../gdbsupport/observable.h:166 >> #19 0x0000555555cdd85b in symbol_file_add_with_addrs (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1131 >> #20 0x0000555555cdd9c5 in symbol_file_add_from_bfd (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1167 >> #21 0x0000555555c9dd69 in solib_read_symbols (so=0x5555569792d0, flags=...) at ../../src/gdb/solib.c:730 >> #22 0x0000555555c9e7b7 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at ../../src/gdb/solib.c:1041 >> #23 0x0000555555c9f61d in handle_solib_event () at ../../src/gdb/solib.c:1315 >> #24 0x0000555555729c26 in bpstat_stop_status (aspace=0x555556606800, bp_addr=0x7ffff7fe7278, thread=0x555556816bd0, ws=..., stop_chain=0x0) at ../../src/gdb/breakpoint.c:5702 >> #25 0x0000555555a62e41 in handle_signal_stop (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6517 >> #26 0x0000555555a61479 in handle_inferior_event (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6000 >> #27 0x0000555555a5c7b5 in fetch_inferior_event () at ../../src/gdb/infrun.c:4403 >> #28 0x0000555555a35b65 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:41 >> #29 0x0000555555aae0c9 in handle_target_event (error=0, client_data=0x0) at ../../src/gdb/linux-nat.c:4231 >> >> >> Now, when I try the same on a Fedora 32 machine, I see the GDB crash due to the stale >> LWP still in the LWP list with no corresponding thread_info. On this >> machine, glibc predates the changes that make it possible to use libthread_db with >> non-threaded processes, so try_thread_db_load doesn't manage to open a connection >> to libthread_db, and thus we don't end up in linux_stop_and_wait_all_lwps, and thus >> the stale lwp is not deleted. And so a subsequent "kill" command crashes. >> >> I wrote that patch originally on an Ubuntu 20.04 machine (vs the Ubuntu 22.04 I have now), >> and it must be that that version also predates the glibc change, and thus behaves like >> this Fedora 32 box. You are very likely using a newer Fedora which has the glibc change. > > ... > >>> What are your thoughts on including this, or something like this with >>> this commit? My patch, which applies on top of this commit, is included >>> at the end of this email. Please feel free to take any changes that you >>> feel add value. >> >> I'm totally fine with such a command, though the test I had added covers >> as much as it would, as the "kill" command fails when the maint command >> would fail, and passes when the maint command passes. But I'll incorporate >> it. >> > > I realized that my description of the problem above practically > suggests a way to expose the crash everywhere -- just catch the exec > event with "catch exec", so that the post-exec program doesn't even > get to the libc.so.6 load event, and issue "kill" there, or use "maint info linux-lwps". > So I've adjusted the patch to add a new testcase doing that. I've attached two > patches, one adding your "maint info linux-lwps", now with NEWS/docs, and > the updated version of the crash fix and testcase. > > WDYT? > > Pedro Alves > From 450e0133fc884f027cce4ae65378ea5560f6464d Mon Sep 17 00:00:00 2001 > From: Andrew Burgess > Date: Tue, 4 Apr 2023 14:50:35 +0100 > Subject: [PATCH 1/2] Add "maint info linux-lwps" command > > This adds a maintenance command that lets you list all the LWPs under > control of the linux-nat target. > > For example: > > (gdb) maint info linux-lwps > LWP Ptid Thread ID > 560948.561047.0 None > 560948.560948.0 1.1 > > This shows that "560948.561047.0" LWP doesn't map to any thread_info > object, which is bogus. We'll be using this in a testcase in a > following patch. > > Co-Authored-By: Pedro Alves > Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5 > --- > gdb/NEWS | 3 +++ > gdb/doc/gdb.texinfo | 4 ++++ > gdb/linux-nat.c | 46 +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 53 insertions(+) > > diff --git a/gdb/NEWS b/gdb/NEWS > index d729aa24056..3747e7d52c1 100644 > --- a/gdb/NEWS > +++ b/gdb/NEWS > @@ -78,6 +78,9 @@ maintenance info frame-unwinders > maintenance wait-for-index-cache > Wait until all pending writes to the index cache have completed. > > +maintenance info linux-lwps > + List all LWPs under control of the linux-nat target. > + > set always-read-ctf on|off > show always-read-ctf > When off, CTF is only read if DWARF is not present. When on, CTF is > diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo > index 6c811b8be2e..398bbb88af6 100644 > --- a/gdb/doc/gdb.texinfo > +++ b/gdb/doc/gdb.texinfo > @@ -40605,6 +40605,10 @@ module (@pxref{Disassembly In Python}), and will only be present after > that module has been imported. To force the module to be imported do > the following: > > +@kindex maint info linux-lwps > +@item maint info linux-lwps > +Print information about LWPs under control of the Linux native target. > + > @smallexample > (@value{GDBP}) python import gdb.disassembler > @end smallexample > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 944f23de01a..68816ddc999 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -4479,6 +4479,49 @@ current_lwp_ptid (void) > return inferior_ptid; > } > > +/* Implement 'maintenance info linux-lwps'. Displays some basic > + information about all the current lwp_info objects. */ > + > +static void > +maintenance_info_lwps (const char *arg, int from_tty) > +{ > + if (all_lwps ().size () == 0) > + { > + gdb_printf ("No Linux LWPs\n"); > + return; > + } > + > + /* Start the width at 8 to match the column heading below, then > + figure out the widest ptid string. We'll use this to build our > + output table below. */ > + size_t ptid_width = 8; > + for (lwp_info *lp : all_lwps ()) > + ptid_width = std::max (ptid_width, lp->ptid.to_string ().size ()); > + > + /* Setup the table headers. */ > + struct ui_out *uiout = current_uiout; > + ui_out_emit_table table_emitter (uiout, 2, -1, "linux-lwps"); > + uiout->table_header (ptid_width, ui_left, "lwp-ptid", _("LWP Ptid")); > + uiout->table_header (9, ui_left, "thread-info", _("Thread ID")); > + uiout->table_body (); > + > + /* Display one table row for each lwp_info. */ > + for (lwp_info *lp : all_lwps ()) > + { > + ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry"); > + > + struct thread_info *th = find_thread_ptid (linux_target, lp->ptid); After recent changes this line becomes: struct thread_info *th = linux_target->find_thread (lp->ptid); > + > + uiout->field_string ("lwp-ptid", lp->ptid.to_string ().c_str ()); > + if (th == nullptr) > + uiout->field_string ("thread-info", "None"); > + else > + uiout->field_string ("thread-info", print_full_thread_id (th)); > + > + uiout->message ("\n"); > + } > +} > + > void _initialize_linux_nat (); > void > _initialize_linux_nat () > @@ -4516,6 +4559,9 @@ Enables printf debugging output."), > sigemptyset (&blocked_mask); > > lwp_lwpid_htab_create (); > + > + add_cmd ("linux-lwps", class_maintenance, maintenance_info_lwps, > + _("List the Linux LWPS."), &maintenanceinfolist); > } > > > > base-commit: 57573e54afb9f7ed957eec43dfd2830f2384c970 > prerequisite-patch-id: 3a896bfe4b7c66a2e3a6aa668c5ae8395e5d8a52 > -- > 2.36.0 > > From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001 > From: Pedro Alves > Date: Wed, 13 Jul 2022 17:16:38 +0100 > Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace > exec event > > I noticed that on an Ubuntu 20.04 system, after a following patch > ("Step over clone syscall w/ breakpoint, > TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp > was passing cleanly, but still, we'd end up with four new unexpected > GDB core dumps: > > === gdb Summary === > > # of unexpected core files 4 > # of expected passes 48 > > That said patch is making the pre-existing > gdb.threads/step-over-exec.exp testcase (almost silently) expose a > latent problem in gdb/linux-nat.c, resulting in a GDB crash when: > > #1 - a non-leader thread execs > #2 - the post-exec program stops somewhere > #3 - you kill the inferior > > Instead of #3 directly, the testcase just returns, which ends up in > gdb_exit, tearing down GDB, which kills the inferior, and is thus > equivalent to #3 above. > > Vis: > > $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true > ... > (top-gdb) r > ... > (gdb) b main > ... > (gdb) r > ... > Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 > 69 argv0 = argv[0]; > (gdb) c > Continuing. > [New Thread 0x7ffff7d89700 (LWP 2506975)] > Other going in exec. > Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > > Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 > 28 foo (); > (gdb) k > ... > Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. > 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > 393 return m_suspend.waitstatus_pending_p; > (top-gdb) bt > #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 > #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 > #3 0x0000555555a92a26 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 > #4 0x0000555555a92a51 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 > #5 0x0000555555a91f84 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 > #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 > #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 ) at ../../src/gdb/linux-nat.c:3590 > #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 > ... As I mentioned in my other message, this backtrace includes kill_unfollowed_child_callback, which doesn't exist yet! I think that's OK though, the text before the backtrace does make it clear that you saw this problem only after applying a later patch. > > The root of the problem is that when a non-leader LWP execs, it just > changes its tid to the tgid, replacing the pre-exec leader thread, > becoming the new leader. There's no thread exit event for the execing > thread. It's as if the old pre-exec LWP vanishes without trace. The > ptrace man page says: > > "PTRACE_O_TRACEEXEC (since Linux 2.5.46) > Stop the tracee at the next execve(2). A waitpid(2) by the > tracer will return a status value such that > > status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) > > If the execing thread is not a thread group leader, the thread > ID is reset to thread group leader's ID before this stop. > Since Linux 3.0, the former thread ID can be retrieved with > PTRACE_GETEVENTMSG." > > When the core of GDB processes an exec events, it deletes all the > threads of the inferior. But, that is too late -- deleting the thread > does not delete the corresponding LWP, so we end leaving the pre-exec > non-leader LWP stale in the LWP list. That's what leads to the crash > above -- linux_nat_target::kill iterates over all LWPs, and after the > patch in question, that code will look for the corresponding > thread_info for each LWP. For the pre-exec non-leader LWP still > listed, won't find one. > > This patch fixes it, by deleting the pre-exec non-leader LWP (and > thread) from the LWP/thread lists as soon as we get an exec event out > of ptrace. > > GDBserver does not need an equivalent fix, because it is already doing > this, as side effect of mourning the pre-exec process, in > gdbserver/linux-low.cc: > > else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) > { > ... > /* Delete the execing process and all its threads. */ > mourn (proc); > switch_to_thread (nullptr); > > > The crash with gdb.threads/step-over-exec.exp is not observable on > newer systems, which postdate the glibc change to move "libpthread.so" > internals to "libc.so.6", because right after the exec, GDB traps a > load event for "libc.so.6", which leads to GDB trying to open > libthread_db for the post-exec inferior, and, on such systems that > succeeds. When we load libthread_db, we call > linux_stop_and_wait_all_lwps, which, as the name suggests, stops all > lwps, and then waits to see their stops. While doing this, GDB > detects that the pre-exec stale LWP is gone, and deletes it. > > If we use "catch exec" to stop right at the exec before the > "libc.so.6" load event ever happens, and issue "kill" right there, > then GDB crashes on newer systems as well. So instead of tweaking > gdb.threads/step-over-exec.exp to cover the fix, add a new > gdb.threads/threads-after-exec.exp testcase that uses "catch exec". Maybe it's worth mentioning that because the crash itself only happens once a later patch is applied we use 'maint info linux-lwps' to reveal the issue for now? > > Also tweak a comment in infrun.c:follow_exec referring to how > linux-nat.c used to behave, as it would become stale otherwise. > > Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643 > --- > gdb/infrun.c | 8 +-- > gdb/linux-nat.c | 15 ++++ > .../gdb.threads/threads-after-exec.exp | 70 +++++++++++++++++++ Oops, this diff is missing the two source files for this test (.c and -execd.c). I was able to figure something out though so I could test the rest of this patch :) > 3 files changed, 88 insertions(+), 5 deletions(-) > create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp > > diff --git a/gdb/infrun.c b/gdb/infrun.c > index abe49ae0f2f..93edc224622 100644 > --- a/gdb/infrun.c > +++ b/gdb/infrun.c > @@ -1224,13 +1224,11 @@ follow_exec (ptid_t ptid, const char *exec_file_target) > some other thread does the exec, and even if the main thread was > stopped or already gone. We may still have non-leader threads of > the process on our list. E.g., on targets that don't have thread > - exit events (like remote); or on native Linux in non-stop mode if > - there were only two threads in the inferior and the non-leader > - one is the one that execs (and nothing forces an update of the > - thread list up to here). When debugging remotely, it's best to > + exit events (like remote) and nothing forces an update of the > + thread list up to here. When debugging remotely, it's best to > avoid extra traffic, when possible, so avoid syncing the thread > list with the target, and instead go ahead and delete all threads > - of the process but one that reported the event. Note this must > + of the process but the one that reported the event. Note this must > be done before calling update_breakpoints_after_exec, as > otherwise clearing the threads' resources would reference stale > thread breakpoints -- it may have been one of these threads that > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 68816ddc999..90ac94440b8 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -2001,6 +2001,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status) > thread execs, it changes its tid to the tgid, and the old > tgid thread might have not been resumed. */ > lp->resumed = 1; > + > + /* All other LWPs are gone now. We'll have received a thread > + exit notification for all threads other the execing one. > + That one, if it wasn't the leader, just silently changes its > + tid to the tgid, and the previous leader vanishes. Since > + Linux 3.0, the former thread ID can be retrieved with > + PTRACE_GETEVENTMSG, but since we support older kernels, don't > + bother with it, and just walk the LWP list. Even with > + PTRACE_GETEVENTMSG, we'd still need to lookup the > + corresponding LWP object, and it would be an extra ptrace > + syscall, so this way may even be more efficient. */ > + for (lwp_info *other_lp : all_lwps_safe ()) > + if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ()) > + exit_lwp (other_lp); > + > return 0; > } > > diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.exp b/gdb/testsuite/gdb.threads/threads-after-exec.exp > new file mode 100644 > index 00000000000..824dda349a6 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/threads-after-exec.exp > @@ -0,0 +1,70 @@ > +# Copyright 2023 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# Test that after an exec of a non-leader thread, we don't leave the > +# non-leader thread listed in internal thread lists, causing problems. > + > +standard_testfile .c -execd.c > + > +proc do_test { } { > + global srcdir subdir srcfile srcfile2 binfile testfile > + global decimal > + > + # Compile main binary (the one that does the exec). > + if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \ > + executable {debug}] != "" } { > + return -1 > + } You can do: if {[build_executable "failed to build main executable" \ $binfile $srcfile {debug pthread}] == -1} { return -1 } > + > + # Compile the second binary (the one that gets exec'd). > + if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \ > + executable {debug}] != "" } { > + return -1 > + } And: if {[build_executable "failed to build execd executable" \ $binfile-execd $srcfile2 {debug}] == -1} { return -1 } I thought we were moving away from calling the gdb_compile* functions directly. Assuming the missing source files are added, this all looks great. Reviewed-By: Andrew Burgess Thanks, Andrew > + > + clean_restart $binfile > + > + if ![runto_main] { > + return > + } > + > + gdb_test "catch exec" "Catchpoint $decimal \\(exec\\)" > + > + gdb_test "continue" "Catchpoint $decimal .*" "continue until exec" > + > + # Confirm we only have one thread in the thread list. > + gdb_test "info threads" "\\* 1\[ \t\]+\[^\r\n\]+.*" > + > + if {[istarget *-*-linux*] && [gdb_is_target_native]} { > + # Confirm there's only one LWP in the list as well, and that > + # it is bound to thread 1.1. > + set inf_pid [get_inferior_pid] > + gdb_test_multiple "maint info linux-lwps" "" { > + -wrap -re "Thread ID *\r\n$inf_pid\.$inf_pid\.0\[ \t\]+1\.1 *" { > + pass $gdb_test_name > + } > + } > + } > + > + # Test that GDB is able to kill the inferior. This used to crash > + # on native Linux as GDB did not dispose of the pre-exec LWP for > + # the non-leader (and that LWP did not have a matching thread in > + # the core thread list). > + gdb_test "with confirm off -- kill" \ > + "\\\[Inferior 1 (.*) killed\\\]" \ > + "kill inferior" > +} > + > +do_test > -- > 2.36.0