From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0gse=BP=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 0A22C3858430
	for <gdb-patches@sourceware.org>; Fri, 26 May 2023 15:05:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A22C3858430
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1685113501;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=quqAYunR7BfFMjWzjGStViAd706lJvb0p56BAejq6b4=;
	b=ITXqdq6rFXTTSmeVBq9HvuNiQxCpttla46tUlUABTh0ULOKdxbvDke6bpQH+hU5vDOcmlg
	7w6FhsJIjZSFNv03Y+krM2Hdrd+pFlai+FZgeLt10k/xAkiW/7nmtuSJSegXLixV/15R2W
	4u3JYmzQ0J9RKaEVVDeGNY0jqYW/o8M=
Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com
 [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-613-tYlwZ5a2O3iFE4B62BamwA-1; Fri, 26 May 2023 11:04:59 -0400
X-MC-Unique: tYlwZ5a2O3iFE4B62BamwA-1
Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f60536450eso5777645e9.1
        for <gdb-patches@sourceware.org>; Fri, 26 May 2023 08:04:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1685113498; x=1687705498;
        h=mime-version:message-id:date:references:in-reply-to:subject:to:from
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=KQ9O+aVuA9QsbONOtoOe/x3mT/NX3zL4Rw0fhWUvNpo=;
        b=Yhzxxpk4M+UqNPtUhLERZn9+JfpEhwWZ6N059SFdKQlmrhJ/aqT58IxIrfRTAidWvt
         b4+rQEipVhwokf9GWh3vstQE9GxTvoNbuGQqubfo1M4tXIc0G6Yaz+n8ZRUzF496WmUP
         HCoq3EAmNlXl6EbcfFzg2ls5f6JVwXAWaEdVvUdABsjtCLQIFTQfJdJOtdbdt6IWKUP2
         zeIOxlQFHpmiiYmsvjle6xZjIgadce+WoWVTdU96Xdu+jEi3+BAauo8TETSBROyYJls6
         V2YQxbpQ3ozwWw6gE1VfXZAYtkR16ahiokuktG1y58hIRV2fqRgnh+PwjVnNWyjADUns
         SPRg==
X-Gm-Message-State: AC+VfDyvHyHLy/n6V29g2BRoPnVhJ77K6I7L/xJFmV6rjlVxLTAyTw8t
	RRVyF9WXpOrid09GDBJqsAewpntI4epK8Xed0x7VZXo5J4FE0MGJIq+7RUz5+h21ft09AML2vA6
	kMyfIBk4fsrjXn/k1brN5X8TsiElaaA==
X-Received: by 2002:a05:600c:2304:b0:3f6:e42:8f9b with SMTP id 4-20020a05600c230400b003f60e428f9bmr1894431wmo.27.1685113498362;
        Fri, 26 May 2023 08:04:58 -0700 (PDT)
X-Google-Smtp-Source: ACHHUZ5p2nBfwRpgjKP9qM1/aF2otH+QCQbxGLhNFOh+/w3x8JCB1qF5Gy2FmopFcyC3PinwM60quQ==
X-Received: by 2002:a05:600c:2304:b0:3f6:e42:8f9b with SMTP id 4-20020a05600c230400b003f60e428f9bmr1894397wmo.27.1685113497861;
        Fri, 26 May 2023 08:04:57 -0700 (PDT)
Received: from localhost (11.72.115.87.dyn.plus.net. [87.115.72.11])
        by smtp.gmail.com with ESMTPSA id y12-20020a1c4b0c000000b003f4290720d0sm8960972wma.47.2023.05.26.08.04.57
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 26 May 2023 08:04:57 -0700 (PDT)
From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on
 ptrace exec event
In-Reply-To: <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net>
References: <20221212203101.1034916-1-pedro@palves.net>
 <20221212203101.1034916-4-pedro@palves.net> <87ileucg5f.fsf@redhat.com>
 <7346b585-adb2-743e-fdaf-213fc595f93b@palves.net>
 <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net>
Date: Fri, 26 May 2023 16:04:55 +0100
Message-ID: <87pm6n5e20.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Pedro Alves <pedro@palves.net> writes:

> Hi!
>
> On 2023-04-04 2:57 p.m., Pedro Alves wrote:
>> On 2023-03-21 2:50 p.m., Andrew Burgess wrote:
>>>
>>> I thought it was the second case, but I was so unsure that I tried the
>>> reproducer anyway.   Just in case I'm wrong, the above example doesn't
>>> seem to fail prior to this commit.
>> 
>> This surprised me, and when I tried it myself, I was even more surprised,
>> for I couldn't reproduce it either!
>> 
>> But I figured it out.
>> 
>> I'm usually using Ubuntu 22.04 for development nowadays, and in that system, indeed I can't
>> reproduce it.  Right after the exec, GDB traps a load event for "libc.so.6", which leads to
>> gdb trying to open libthread_db for the post-exec inferior, and, it succeeds.  When we load
>> libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps,
>> and then waits to see their stops.  While doing this, GDB detects that the pre-exec stale
>> LWP is gone, and deletes it.
>> 
>> The logs show:
>> 
>> [linux-nat] linux_nat_wait_1: waitpid 1725529 received SIGTRAP - Trace/breakpoint trap (stopped)
>> [linux-nat] save_stop_reason: 1725529.1725529.0 stopped by software breakpoint
>> [linux-nat] linux_nat_wait_1: waitpid(-1, ...) returned 0, ERRNO-OK
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725658.0, not stopped
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725529.0, has pending status
>> [linux-nat] linux_nat_wait_1: trap ptid is 1725529.1725529.0.
>> [linux-nat] linux_nat_wait_1: exit
>> [linux-nat] stop_callback: kill 1725529.1725658.0 **<SIGSTOP>**
>> [linux-nat] stop_callback: lwp kill -1 No such process
>> [linux-nat] wait_lwp: 1725529.1725658.0 vanished.
>> 
>> And the backtrace is:
>> 
>> (top-gdb) bt
>> #0  wait_lwp (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2069
>> #1  0x0000555555aa8fbf in stop_wait_callback (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2375
>> #2  0x0000555555ab12b3 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (__closure=0x0, ecall=..., args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:326
>> #3  0x0000555555ab12e2 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:320
>> #4  0x0000555555ab0610 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffca90, args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:289
>> #5  0x0000555555aa4c2d in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:867
>> #6  0x0000555555aa8a03 in linux_stop_and_wait_all_lwps () at ../../src/gdb/linux-nat.c:2229
>> #7  0x0000555555ac8525 in try_thread_db_load_1 (info=0x555556a66dd0) at ../../src/gdb/linux-thread-db.c:923
>> #8  0x0000555555ac89d5 in try_thread_db_load (library=0x5555560eca27 "libthread_db.so.1", check_auto_load_safe=false) at ../../src/gdb/linux-thread-db.c:1024
>> #9  0x0000555555ac8eda in try_thread_db_load_from_sdir () at ../../src/gdb/linux-thread-db.c:1108
>> #10 0x0000555555ac9278 in thread_db_load_search () at ../../src/gdb/linux-thread-db.c:1163
>> #11 0x0000555555ac9518 in thread_db_load () at ../../src/gdb/linux-thread-db.c:1225
>> #12 0x0000555555ac95e1 in check_for_thread_db () at ../../src/gdb/linux-thread-db.c:1268
>> #13 0x0000555555ac9657 in thread_db_new_objfile (objfile=0x555556943ed0) at ../../src/gdb/linux-thread-db.c:1297
>> #14 0x000055555569e2d2 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61
>> #15 0x000055555569c44a in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111
>> #16 0x0000555555699d69 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffce50: 0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:290
>> #17 0x0000555555b5f48b in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555567925d8, __args#0=0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:590
>> #18 0x0000555555b5eba4 in gdb::observers::observable<objfile*>::notify (this=0x5555565b5680 <gdb::observers::new_objfile>, args#0=0x555556943ed0) at ../../src/gdb/../gdbsupport/observable.h:166
>> #19 0x0000555555cdd85b in symbol_file_add_with_addrs (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1131
>> #20 0x0000555555cdd9c5 in symbol_file_add_from_bfd (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1167
>> #21 0x0000555555c9dd69 in solib_read_symbols (so=0x5555569792d0, flags=...) at ../../src/gdb/solib.c:730
>> #22 0x0000555555c9e7b7 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at ../../src/gdb/solib.c:1041
>> #23 0x0000555555c9f61d in handle_solib_event () at ../../src/gdb/solib.c:1315
>> #24 0x0000555555729c26 in bpstat_stop_status (aspace=0x555556606800, bp_addr=0x7ffff7fe7278, thread=0x555556816bd0, ws=..., stop_chain=0x0) at ../../src/gdb/breakpoint.c:5702
>> #25 0x0000555555a62e41 in handle_signal_stop (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6517
>> #26 0x0000555555a61479 in handle_inferior_event (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6000
>> #27 0x0000555555a5c7b5 in fetch_inferior_event () at ../../src/gdb/infrun.c:4403
>> #28 0x0000555555a35b65 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:41
>> #29 0x0000555555aae0c9 in handle_target_event (error=0, client_data=0x0) at ../../src/gdb/linux-nat.c:4231
>> 
>> 
>> Now, when I try the same on a Fedora 32 machine, I see the GDB crash due to the stale
>> LWP still in the LWP list with no corresponding thread_info.  On this
>> machine, glibc predates the changes that make it possible to use libthread_db with
>> non-threaded processes, so try_thread_db_load doesn't manage to open a connection
>> to libthread_db, and thus we don't end up in linux_stop_and_wait_all_lwps, and thus
>> the stale lwp is not deleted.  And so a subsequent "kill" command crashes.
>> 
>> I wrote that patch originally on an Ubuntu 20.04 machine (vs the Ubuntu 22.04 I have now),
>> and it must be that that version also predates the glibc change, and thus behaves like
>> this Fedora 32 box.  You are very likely using a newer Fedora which has the glibc change.
>
> ...
>
>>> What are your thoughts on including this, or something like this with
>>> this commit?  My patch, which applies on top of this commit, is included
>>> at the end of this email.  Please feel free to take any changes that you
>>> feel add value.
>> 
>> I'm totally fine with such a command, though the test I had added covers
>> as much as it would, as the "kill" command fails when the maint command
>> would fail, and passes when the maint command passes.  But I'll incorporate
>> it.
>> 
>
> I realized that my description of the problem above practically
> suggests a way to expose the crash everywhere -- just catch the exec
> event with "catch exec", so that the post-exec program doesn't even
> get to the libc.so.6 load event, and issue "kill" there, or use "maint info linux-lwps".
> So I've adjusted the patch to add a new testcase doing that.  I've attached two
> patches, one adding your "maint info linux-lwps", now with NEWS/docs, and
> the updated version of the crash fix and testcase.
>
> WDYT?
>
> Pedro Alves
> From 450e0133fc884f027cce4ae65378ea5560f6464d Mon Sep 17 00:00:00 2001
> From: Andrew Burgess <aburgess@redhat.com>
> Date: Tue, 4 Apr 2023 14:50:35 +0100
> Subject: [PATCH 1/2] Add "maint info linux-lwps" command
>
> This adds a maintenance command that lets you list all the LWPs under
> control of the linux-nat target.
>
> For example:
>
>  (gdb) maint info linux-lwps
>  LWP Ptid        Thread ID
>  560948.561047.0 None
>  560948.560948.0 1.1
>
> This shows that "560948.561047.0" LWP doesn't map to any thread_info
> object, which is bogus.  We'll be using this in a testcase in a
> following patch.
>
> Co-Authored-By: Pedro Alves <pedro@palves.net>
> Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
> ---
>  gdb/NEWS            |  3 +++
>  gdb/doc/gdb.texinfo |  4 ++++
>  gdb/linux-nat.c     | 46 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d729aa24056..3747e7d52c1 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -78,6 +78,9 @@ maintenance info frame-unwinders
>  maintenance wait-for-index-cache
>    Wait until all pending writes to the index cache have completed.
>  
> +maintenance info linux-lwps
> +  List all LWPs under control of the linux-nat target.
> +
>  set always-read-ctf on|off
>  show always-read-ctf
>    When off, CTF is only read if DWARF is not present.  When on, CTF is
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 6c811b8be2e..398bbb88af6 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -40605,6 +40605,10 @@ module (@pxref{Disassembly In Python}), and will only be present after
>  that module has been imported.  To force the module to be imported do
>  the following:
>  
> +@kindex maint info linux-lwps
> +@item maint info linux-lwps
> +Print information about LWPs under control of the Linux native target.
> +
>  @smallexample
>  (@value{GDBP}) python import gdb.disassembler
>  @end smallexample
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 944f23de01a..68816ddc999 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -4479,6 +4479,49 @@ current_lwp_ptid (void)
>    return inferior_ptid;
>  }
>  
> +/* Implement 'maintenance info linux-lwps'.  Displays some basic
> +   information about all the current lwp_info objects.  */
> +
> +static void
> +maintenance_info_lwps (const char *arg, int from_tty)
> +{
> +  if (all_lwps ().size () == 0)
> +    {
> +      gdb_printf ("No Linux LWPs\n");
> +      return;
> +    }
> +
> +  /* Start the width at 8 to match the column heading below, then
> +     figure out the widest ptid string.  We'll use this to build our
> +     output table below.  */
> +  size_t ptid_width = 8;
> +  for (lwp_info *lp : all_lwps ())
> +    ptid_width = std::max (ptid_width, lp->ptid.to_string ().size ());
> +
> +  /* Setup the table headers.  */
> +  struct ui_out *uiout = current_uiout;
> +  ui_out_emit_table table_emitter (uiout, 2, -1, "linux-lwps");
> +  uiout->table_header (ptid_width, ui_left, "lwp-ptid", _("LWP Ptid"));
> +  uiout->table_header (9, ui_left, "thread-info", _("Thread ID"));
> +  uiout->table_body ();
> +
> +  /* Display one table row for each lwp_info.  */
> +  for (lwp_info *lp : all_lwps ())
> +    {
> +      ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry");
> +
> +      struct thread_info *th = find_thread_ptid (linux_target, lp->ptid);

After recent changes this line becomes:

  struct thread_info *th = linux_target->find_thread (lp->ptid);

> +
> +      uiout->field_string ("lwp-ptid", lp->ptid.to_string ().c_str ());
> +      if (th == nullptr)
> +	uiout->field_string ("thread-info", "None");
> +      else
> +	uiout->field_string ("thread-info", print_full_thread_id (th));
> +
> +      uiout->message ("\n");
> +    }
> +}
> +
>  void _initialize_linux_nat ();
>  void
>  _initialize_linux_nat ()
> @@ -4516,6 +4559,9 @@ Enables printf debugging output."),
>    sigemptyset (&blocked_mask);
>  
>    lwp_lwpid_htab_create ();
> +
> +  add_cmd ("linux-lwps", class_maintenance, maintenance_info_lwps,
> +	 _("List the Linux LWPS."), &maintenanceinfolist);
>  }
>  
>  
>
> base-commit: 57573e54afb9f7ed957eec43dfd2830f2384c970
> prerequisite-patch-id: 3a896bfe4b7c66a2e3a6aa668c5ae8395e5d8a52
> -- 
> 2.36.0
>
> From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <pedro@palves.net>
> Date: Wed, 13 Jul 2022 17:16:38 +0100
> Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace
>  exec event
>
> I noticed that on an Ubuntu 20.04 system, after a following patch
> ("Step over clone syscall w/ breakpoint,
> TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
> was passing cleanly, but still, we'd end up with four new unexpected
> GDB core dumps:
>
> 		 === gdb Summary ===
>
>  # of unexpected core files      4
>  # of expected passes            48
>
> That said patch is making the pre-existing
> gdb.threads/step-over-exec.exp testcase (almost silently) expose a
> latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
>
>  #1 - a non-leader thread execs
>  #2 - the post-exec program stops somewhere
>  #3 - you kill the inferior
>
> Instead of #3 directly, the testcase just returns, which ends up in
> gdb_exit, tearing down GDB, which kills the inferior, and is thus
> equivalent to #3 above.
>
> Vis:
>
>  $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
>  ...
>  (top-gdb) r
>  ...
>  (gdb) b main
>  ...
>  (gdb) r
>  ...
>  Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
>  69        argv0 = argv[0];
>  (gdb) c
>  Continuing.
>  [New Thread 0x7ffff7d89700 (LWP 2506975)]
>  Other going in exec.
>  Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>  process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>
>  Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
>  28        foo ();
>  (gdb) k
>  ...
>  Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
>  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>  393         return m_suspend.waitstatus_pending_p;
>  (top-gdb) bt
>  #0  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>  #1  0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
>  #2  0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
>  #3  0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
>  #4  0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
>  #5  0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
>  #6  0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
>  #7  0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
>  #8  0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
>  ...

As I mentioned in my other message, this backtrace includes
kill_unfollowed_child_callback, which doesn't exist yet!  I think that's
OK though, the text before the backtrace does make it clear that you saw
this problem only after applying a later patch.

>
> The root of the problem is that when a non-leader LWP execs, it just
> changes its tid to the tgid, replacing the pre-exec leader thread,
> becoming the new leader.  There's no thread exit event for the execing
> thread.  It's as if the old pre-exec LWP vanishes without trace.  The
> ptrace man page says:
>
> "PTRACE_O_TRACEEXEC (since Linux 2.5.46)
> 	Stop the tracee at the next execve(2).  A waitpid(2) by the
> 	tracer will return a status value such that
>
> 	  status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
>
> 	If the execing thread is not a thread group leader, the thread
> 	ID is reset to thread group leader's ID before this stop.
> 	Since Linux 3.0, the former thread ID can be retrieved with
> 	PTRACE_GETEVENTMSG."
>
> When the core of GDB processes an exec events, it deletes all the
> threads of the inferior.  But, that is too late -- deleting the thread
> does not delete the corresponding LWP, so we end leaving the pre-exec
> non-leader LWP stale in the LWP list.  That's what leads to the crash
> above -- linux_nat_target::kill iterates over all LWPs, and after the
> patch in question, that code will look for the corresponding
> thread_info for each LWP.  For the pre-exec non-leader LWP still
> listed, won't find one.
>
> This patch fixes it, by deleting the pre-exec non-leader LWP (and
> thread) from the LWP/thread lists as soon as we get an exec event out
> of ptrace.
>
> GDBserver does not need an equivalent fix, because it is already doing
> this, as side effect of mourning the pre-exec process, in
> gdbserver/linux-low.cc:
>
>   else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
>     {
> ...
>       /* Delete the execing process and all its threads.  */
>       mourn (proc);
>       switch_to_thread (nullptr);
>
>
> The crash with gdb.threads/step-over-exec.exp is not observable on
> newer systems, which postdate the glibc change to move "libpthread.so"
> internals to "libc.so.6", because right after the exec, GDB traps a
> load event for "libc.so.6", which leads to GDB trying to open
> libthread_db for the post-exec inferior, and, on such systems that
> succeeds.  When we load libthread_db, we call
> linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
> lwps, and then waits to see their stops.  While doing this, GDB
> detects that the pre-exec stale LWP is gone, and deletes it.
>
> If we use "catch exec" to stop right at the exec before the
> "libc.so.6" load event ever happens, and issue "kill" right there,
> then GDB crashes on newer systems as well.  So instead of tweaking
> gdb.threads/step-over-exec.exp to cover the fix, add a new
> gdb.threads/threads-after-exec.exp testcase that uses "catch exec".

Maybe it's worth mentioning that because the crash itself only happens
once a later patch is applied we use 'maint info linux-lwps' to reveal
the issue for now?

>
> Also tweak a comment in infrun.c:follow_exec referring to how
> linux-nat.c used to behave, as it would become stale otherwise.
>
> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
> ---
>  gdb/infrun.c                                  |  8 +--
>  gdb/linux-nat.c                               | 15 ++++
>  .../gdb.threads/threads-after-exec.exp        | 70 +++++++++++++++++++

Oops, this diff is missing the two source files for this test (.c and
-execd.c).  I was able to figure something out though so I could test
the rest of this patch :)

>  3 files changed, 88 insertions(+), 5 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index abe49ae0f2f..93edc224622 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -1224,13 +1224,11 @@ follow_exec (ptid_t ptid, const char *exec_file_target)
>       some other thread does the exec, and even if the main thread was
>       stopped or already gone.  We may still have non-leader threads of
>       the process on our list.  E.g., on targets that don't have thread
> -     exit events (like remote); or on native Linux in non-stop mode if
> -     there were only two threads in the inferior and the non-leader
> -     one is the one that execs (and nothing forces an update of the
> -     thread list up to here).  When debugging remotely, it's best to
> +     exit events (like remote) and nothing forces an update of the
> +     thread list up to here.  When debugging remotely, it's best to
>       avoid extra traffic, when possible, so avoid syncing the thread
>       list with the target, and instead go ahead and delete all threads
> -     of the process but one that reported the event.  Note this must
> +     of the process but the one that reported the event.  Note this must
>       be done before calling update_breakpoints_after_exec, as
>       otherwise clearing the threads' resources would reference stale
>       thread breakpoints -- it may have been one of these threads that
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 68816ddc999..90ac94440b8 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2001,6 +2001,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
>  	 thread execs, it changes its tid to the tgid, and the old
>  	 tgid thread might have not been resumed.  */
>        lp->resumed = 1;
> +
> +      /* All other LWPs are gone now.  We'll have received a thread
> +	 exit notification for all threads other the execing one.
> +	 That one, if it wasn't the leader, just silently changes its
> +	 tid to the tgid, and the previous leader vanishes.  Since
> +	 Linux 3.0, the former thread ID can be retrieved with
> +	 PTRACE_GETEVENTMSG, but since we support older kernels, don't
> +	 bother with it, and just walk the LWP list.  Even with
> +	 PTRACE_GETEVENTMSG, we'd still need to lookup the
> +	 corresponding LWP object, and it would be an extra ptrace
> +	 syscall, so this way may even be more efficient.  */
> +      for (lwp_info *other_lp : all_lwps_safe ())
> +	if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ())
> +	  exit_lwp (other_lp);
> +
>        return 0;
>      }
>  
> diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.exp b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> new file mode 100644
> index 00000000000..824dda349a6
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> @@ -0,0 +1,70 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test that after an exec of a non-leader thread, we don't leave the
> +# non-leader thread listed in internal thread lists, causing problems.
> +
> +standard_testfile .c -execd.c
> +
> +proc do_test { } {
> +    global srcdir subdir srcfile srcfile2 binfile testfile
> +    global decimal
> +
> +    # Compile main binary (the one that does the exec).
> +    if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \
> +	     executable {debug}] != "" } {
> +	return -1
> +    }

You can do:

    if {[build_executable "failed to build main executable" \
             $binfile $srcfile {debug pthread}] == -1} {
	return -1
    }

> +
> +    # Compile the second binary (the one that gets exec'd).
> +    if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \
> +	     executable {debug}] != "" } {
> +	return -1
> +    }

And:

    if {[build_executable "failed to build execd executable" \
             $binfile-execd $srcfile2 {debug}] == -1} {
	return -1
    }

I thought we were moving away from calling the gdb_compile* functions
directly.

Assuming the missing source files are added, this all looks great.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>

Thanks,
Andrew

> +
> +    clean_restart $binfile
> +
> +    if ![runto_main] {
> +	return
> +    }
> +
> +    gdb_test "catch exec" "Catchpoint $decimal \\(exec\\)"
> +
> +    gdb_test "continue" "Catchpoint $decimal .*" "continue until exec"
> +
> +    # Confirm we only have one thread in the thread list.
> +    gdb_test "info threads" "\\* 1\[ \t\]+\[^\r\n\]+.*"
> +
> +    if {[istarget *-*-linux*] && [gdb_is_target_native]} {
> +	# Confirm there's only one LWP in the list as well, and that
> +	# it is bound to thread 1.1.
> +	set inf_pid [get_inferior_pid]
> +	gdb_test_multiple "maint info linux-lwps" "" {
> +	    -wrap -re "Thread ID *\r\n$inf_pid\.$inf_pid\.0\[ \t\]+1\.1 *" {
> +		pass $gdb_test_name
> +	    }
> +	}
> +    }
> +
> +    # Test that GDB is able to kill the inferior.  This used to crash
> +    # on native Linux as GDB did not dispose of the pre-exec LWP for
> +    # the non-leader (and that LWP did not have a matching thread in
> +    # the core thread list).
> +    gdb_test "with confirm off -- kill" \
> +	"\\\[Inferior 1 (.*) killed\\\]" \
> +	"kill inferior"
> +}
> +
> +do_test
> -- 
> 2.36.0