Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event
Date: Fri, 26 May 2023 16:04:55 +0100	[thread overview]
Message-ID: <87pm6n5e20.fsf@redhat.com> (raw)
In-Reply-To: <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net>

Pedro Alves <pedro@palves.net> writes:

> Hi!
>
> On 2023-04-04 2:57 p.m., Pedro Alves wrote:
>> On 2023-03-21 2:50 p.m., Andrew Burgess wrote:
>>>
>>> I thought it was the second case, but I was so unsure that I tried the
>>> reproducer anyway.   Just in case I'm wrong, the above example doesn't
>>> seem to fail prior to this commit.
>> 
>> This surprised me, and when I tried it myself, I was even more surprised,
>> for I couldn't reproduce it either!
>> 
>> But I figured it out.
>> 
>> I'm usually using Ubuntu 22.04 for development nowadays, and in that system, indeed I can't
>> reproduce it.  Right after the exec, GDB traps a load event for "libc.so.6", which leads to
>> gdb trying to open libthread_db for the post-exec inferior, and, it succeeds.  When we load
>> libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps,
>> and then waits to see their stops.  While doing this, GDB detects that the pre-exec stale
>> LWP is gone, and deletes it.
>> 
>> The logs show:
>> 
>> [linux-nat] linux_nat_wait_1: waitpid 1725529 received SIGTRAP - Trace/breakpoint trap (stopped)
>> [linux-nat] save_stop_reason: 1725529.1725529.0 stopped by software breakpoint
>> [linux-nat] linux_nat_wait_1: waitpid(-1, ...) returned 0, ERRNO-OK
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725658.0, not stopped
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725529.0, has pending status
>> [linux-nat] linux_nat_wait_1: trap ptid is 1725529.1725529.0.
>> [linux-nat] linux_nat_wait_1: exit
>> [linux-nat] stop_callback: kill 1725529.1725658.0 **<SIGSTOP>**
>> [linux-nat] stop_callback: lwp kill -1 No such process
>> [linux-nat] wait_lwp: 1725529.1725658.0 vanished.
>> 
>> And the backtrace is:
>> 
>> (top-gdb) bt
>> #0  wait_lwp (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2069
>> #1  0x0000555555aa8fbf in stop_wait_callback (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2375
>> #2  0x0000555555ab12b3 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (__closure=0x0, ecall=..., args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:326
>> #3  0x0000555555ab12e2 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:320
>> #4  0x0000555555ab0610 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffca90, args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:289
>> #5  0x0000555555aa4c2d in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:867
>> #6  0x0000555555aa8a03 in linux_stop_and_wait_all_lwps () at ../../src/gdb/linux-nat.c:2229
>> #7  0x0000555555ac8525 in try_thread_db_load_1 (info=0x555556a66dd0) at ../../src/gdb/linux-thread-db.c:923
>> #8  0x0000555555ac89d5 in try_thread_db_load (library=0x5555560eca27 "libthread_db.so.1", check_auto_load_safe=false) at ../../src/gdb/linux-thread-db.c:1024
>> #9  0x0000555555ac8eda in try_thread_db_load_from_sdir () at ../../src/gdb/linux-thread-db.c:1108
>> #10 0x0000555555ac9278 in thread_db_load_search () at ../../src/gdb/linux-thread-db.c:1163
>> #11 0x0000555555ac9518 in thread_db_load () at ../../src/gdb/linux-thread-db.c:1225
>> #12 0x0000555555ac95e1 in check_for_thread_db () at ../../src/gdb/linux-thread-db.c:1268
>> #13 0x0000555555ac9657 in thread_db_new_objfile (objfile=0x555556943ed0) at ../../src/gdb/linux-thread-db.c:1297
>> #14 0x000055555569e2d2 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61
>> #15 0x000055555569c44a in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111
>> #16 0x0000555555699d69 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffce50: 0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:290
>> #17 0x0000555555b5f48b in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555567925d8, __args#0=0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:590
>> #18 0x0000555555b5eba4 in gdb::observers::observable<objfile*>::notify (this=0x5555565b5680 <gdb::observers::new_objfile>, args#0=0x555556943ed0) at ../../src/gdb/../gdbsupport/observable.h:166
>> #19 0x0000555555cdd85b in symbol_file_add_with_addrs (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1131
>> #20 0x0000555555cdd9c5 in symbol_file_add_from_bfd (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1167
>> #21 0x0000555555c9dd69 in solib_read_symbols (so=0x5555569792d0, flags=...) at ../../src/gdb/solib.c:730
>> #22 0x0000555555c9e7b7 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at ../../src/gdb/solib.c:1041
>> #23 0x0000555555c9f61d in handle_solib_event () at ../../src/gdb/solib.c:1315
>> #24 0x0000555555729c26 in bpstat_stop_status (aspace=0x555556606800, bp_addr=0x7ffff7fe7278, thread=0x555556816bd0, ws=..., stop_chain=0x0) at ../../src/gdb/breakpoint.c:5702
>> #25 0x0000555555a62e41 in handle_signal_stop (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6517
>> #26 0x0000555555a61479 in handle_inferior_event (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6000
>> #27 0x0000555555a5c7b5 in fetch_inferior_event () at ../../src/gdb/infrun.c:4403
>> #28 0x0000555555a35b65 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:41
>> #29 0x0000555555aae0c9 in handle_target_event (error=0, client_data=0x0) at ../../src/gdb/linux-nat.c:4231
>> 
>> 
>> Now, when I try the same on a Fedora 32 machine, I see the GDB crash due to the stale
>> LWP still in the LWP list with no corresponding thread_info.  On this
>> machine, glibc predates the changes that make it possible to use libthread_db with
>> non-threaded processes, so try_thread_db_load doesn't manage to open a connection
>> to libthread_db, and thus we don't end up in linux_stop_and_wait_all_lwps, and thus
>> the stale lwp is not deleted.  And so a subsequent "kill" command crashes.
>> 
>> I wrote that patch originally on an Ubuntu 20.04 machine (vs the Ubuntu 22.04 I have now),
>> and it must be that that version also predates the glibc change, and thus behaves like
>> this Fedora 32 box.  You are very likely using a newer Fedora which has the glibc change.
>
> ...
>
>>> What are your thoughts on including this, or something like this with
>>> this commit?  My patch, which applies on top of this commit, is included
>>> at the end of this email.  Please feel free to take any changes that you
>>> feel add value.
>> 
>> I'm totally fine with such a command, though the test I had added covers
>> as much as it would, as the "kill" command fails when the maint command
>> would fail, and passes when the maint command passes.  But I'll incorporate
>> it.
>> 
>
> I realized that my description of the problem above practically
> suggests a way to expose the crash everywhere -- just catch the exec
> event with "catch exec", so that the post-exec program doesn't even
> get to the libc.so.6 load event, and issue "kill" there, or use "maint info linux-lwps".
> So I've adjusted the patch to add a new testcase doing that.  I've attached two
> patches, one adding your "maint info linux-lwps", now with NEWS/docs, and
> the updated version of the crash fix and testcase.
>
> WDYT?
>
> Pedro Alves
> From 450e0133fc884f027cce4ae65378ea5560f6464d Mon Sep 17 00:00:00 2001
> From: Andrew Burgess <aburgess@redhat.com>
> Date: Tue, 4 Apr 2023 14:50:35 +0100
> Subject: [PATCH 1/2] Add "maint info linux-lwps" command
>
> This adds a maintenance command that lets you list all the LWPs under
> control of the linux-nat target.
>
> For example:
>
>  (gdb) maint info linux-lwps
>  LWP Ptid        Thread ID
>  560948.561047.0 None
>  560948.560948.0 1.1
>
> This shows that "560948.561047.0" LWP doesn't map to any thread_info
> object, which is bogus.  We'll be using this in a testcase in a
> following patch.
>
> Co-Authored-By: Pedro Alves <pedro@palves.net>
> Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
> ---
>  gdb/NEWS            |  3 +++
>  gdb/doc/gdb.texinfo |  4 ++++
>  gdb/linux-nat.c     | 46 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d729aa24056..3747e7d52c1 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -78,6 +78,9 @@ maintenance info frame-unwinders
>  maintenance wait-for-index-cache
>    Wait until all pending writes to the index cache have completed.
>  
> +maintenance info linux-lwps
> +  List all LWPs under control of the linux-nat target.
> +
>  set always-read-ctf on|off
>  show always-read-ctf
>    When off, CTF is only read if DWARF is not present.  When on, CTF is
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 6c811b8be2e..398bbb88af6 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -40605,6 +40605,10 @@ module (@pxref{Disassembly In Python}), and will only be present after
>  that module has been imported.  To force the module to be imported do
>  the following:
>  
> +@kindex maint info linux-lwps
> +@item maint info linux-lwps
> +Print information about LWPs under control of the Linux native target.
> +
>  @smallexample
>  (@value{GDBP}) python import gdb.disassembler
>  @end smallexample
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 944f23de01a..68816ddc999 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -4479,6 +4479,49 @@ current_lwp_ptid (void)
>    return inferior_ptid;
>  }
>  
> +/* Implement 'maintenance info linux-lwps'.  Displays some basic
> +   information about all the current lwp_info objects.  */
> +
> +static void
> +maintenance_info_lwps (const char *arg, int from_tty)
> +{
> +  if (all_lwps ().size () == 0)
> +    {
> +      gdb_printf ("No Linux LWPs\n");
> +      return;
> +    }
> +
> +  /* Start the width at 8 to match the column heading below, then
> +     figure out the widest ptid string.  We'll use this to build our
> +     output table below.  */
> +  size_t ptid_width = 8;
> +  for (lwp_info *lp : all_lwps ())
> +    ptid_width = std::max (ptid_width, lp->ptid.to_string ().size ());
> +
> +  /* Setup the table headers.  */
> +  struct ui_out *uiout = current_uiout;
> +  ui_out_emit_table table_emitter (uiout, 2, -1, "linux-lwps");
> +  uiout->table_header (ptid_width, ui_left, "lwp-ptid", _("LWP Ptid"));
> +  uiout->table_header (9, ui_left, "thread-info", _("Thread ID"));
> +  uiout->table_body ();
> +
> +  /* Display one table row for each lwp_info.  */
> +  for (lwp_info *lp : all_lwps ())
> +    {
> +      ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry");
> +
> +      struct thread_info *th = find_thread_ptid (linux_target, lp->ptid);

After recent changes this line becomes:

  struct thread_info *th = linux_target->find_thread (lp->ptid);

> +
> +      uiout->field_string ("lwp-ptid", lp->ptid.to_string ().c_str ());
> +      if (th == nullptr)
> +	uiout->field_string ("thread-info", "None");
> +      else
> +	uiout->field_string ("thread-info", print_full_thread_id (th));
> +
> +      uiout->message ("\n");
> +    }
> +}
> +
>  void _initialize_linux_nat ();
>  void
>  _initialize_linux_nat ()
> @@ -4516,6 +4559,9 @@ Enables printf debugging output."),
>    sigemptyset (&blocked_mask);
>  
>    lwp_lwpid_htab_create ();
> +
> +  add_cmd ("linux-lwps", class_maintenance, maintenance_info_lwps,
> +	 _("List the Linux LWPS."), &maintenanceinfolist);
>  }
>  \f
>  
>
> base-commit: 57573e54afb9f7ed957eec43dfd2830f2384c970
> prerequisite-patch-id: 3a896bfe4b7c66a2e3a6aa668c5ae8395e5d8a52
> -- 
> 2.36.0
>
> From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <pedro@palves.net>
> Date: Wed, 13 Jul 2022 17:16:38 +0100
> Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace
>  exec event
>
> I noticed that on an Ubuntu 20.04 system, after a following patch
> ("Step over clone syscall w/ breakpoint,
> TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
> was passing cleanly, but still, we'd end up with four new unexpected
> GDB core dumps:
>
> 		 === gdb Summary ===
>
>  # of unexpected core files      4
>  # of expected passes            48
>
> That said patch is making the pre-existing
> gdb.threads/step-over-exec.exp testcase (almost silently) expose a
> latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
>
>  #1 - a non-leader thread execs
>  #2 - the post-exec program stops somewhere
>  #3 - you kill the inferior
>
> Instead of #3 directly, the testcase just returns, which ends up in
> gdb_exit, tearing down GDB, which kills the inferior, and is thus
> equivalent to #3 above.
>
> Vis:
>
>  $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
>  ...
>  (top-gdb) r
>  ...
>  (gdb) b main
>  ...
>  (gdb) r
>  ...
>  Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
>  69        argv0 = argv[0];
>  (gdb) c
>  Continuing.
>  [New Thread 0x7ffff7d89700 (LWP 2506975)]
>  Other going in exec.
>  Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>  process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>
>  Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
>  28        foo ();
>  (gdb) k
>  ...
>  Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
>  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>  393         return m_suspend.waitstatus_pending_p;
>  (top-gdb) bt
>  #0  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>  #1  0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
>  #2  0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
>  #3  0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
>  #4  0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
>  #5  0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
>  #6  0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
>  #7  0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
>  #8  0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
>  ...

As I mentioned in my other message, this backtrace includes
kill_unfollowed_child_callback, which doesn't exist yet!  I think that's
OK though, the text before the backtrace does make it clear that you saw
this problem only after applying a later patch.

>
> The root of the problem is that when a non-leader LWP execs, it just
> changes its tid to the tgid, replacing the pre-exec leader thread,
> becoming the new leader.  There's no thread exit event for the execing
> thread.  It's as if the old pre-exec LWP vanishes without trace.  The
> ptrace man page says:
>
> "PTRACE_O_TRACEEXEC (since Linux 2.5.46)
> 	Stop the tracee at the next execve(2).  A waitpid(2) by the
> 	tracer will return a status value such that
>
> 	  status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
>
> 	If the execing thread is not a thread group leader, the thread
> 	ID is reset to thread group leader's ID before this stop.
> 	Since Linux 3.0, the former thread ID can be retrieved with
> 	PTRACE_GETEVENTMSG."
>
> When the core of GDB processes an exec events, it deletes all the
> threads of the inferior.  But, that is too late -- deleting the thread
> does not delete the corresponding LWP, so we end leaving the pre-exec
> non-leader LWP stale in the LWP list.  That's what leads to the crash
> above -- linux_nat_target::kill iterates over all LWPs, and after the
> patch in question, that code will look for the corresponding
> thread_info for each LWP.  For the pre-exec non-leader LWP still
> listed, won't find one.
>
> This patch fixes it, by deleting the pre-exec non-leader LWP (and
> thread) from the LWP/thread lists as soon as we get an exec event out
> of ptrace.
>
> GDBserver does not need an equivalent fix, because it is already doing
> this, as side effect of mourning the pre-exec process, in
> gdbserver/linux-low.cc:
>
>   else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
>     {
> ...
>       /* Delete the execing process and all its threads.  */
>       mourn (proc);
>       switch_to_thread (nullptr);
>
>
> The crash with gdb.threads/step-over-exec.exp is not observable on
> newer systems, which postdate the glibc change to move "libpthread.so"
> internals to "libc.so.6", because right after the exec, GDB traps a
> load event for "libc.so.6", which leads to GDB trying to open
> libthread_db for the post-exec inferior, and, on such systems that
> succeeds.  When we load libthread_db, we call
> linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
> lwps, and then waits to see their stops.  While doing this, GDB
> detects that the pre-exec stale LWP is gone, and deletes it.
>
> If we use "catch exec" to stop right at the exec before the
> "libc.so.6" load event ever happens, and issue "kill" right there,
> then GDB crashes on newer systems as well.  So instead of tweaking
> gdb.threads/step-over-exec.exp to cover the fix, add a new
> gdb.threads/threads-after-exec.exp testcase that uses "catch exec".

Maybe it's worth mentioning that because the crash itself only happens
once a later patch is applied we use 'maint info linux-lwps' to reveal
the issue for now?

>
> Also tweak a comment in infrun.c:follow_exec referring to how
> linux-nat.c used to behave, as it would become stale otherwise.
>
> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
> ---
>  gdb/infrun.c                                  |  8 +--
>  gdb/linux-nat.c                               | 15 ++++
>  .../gdb.threads/threads-after-exec.exp        | 70 +++++++++++++++++++

Oops, this diff is missing the two source files for this test (.c and
-execd.c).  I was able to figure something out though so I could test
the rest of this patch :)

>  3 files changed, 88 insertions(+), 5 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index abe49ae0f2f..93edc224622 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -1224,13 +1224,11 @@ follow_exec (ptid_t ptid, const char *exec_file_target)
>       some other thread does the exec, and even if the main thread was
>       stopped or already gone.  We may still have non-leader threads of
>       the process on our list.  E.g., on targets that don't have thread
> -     exit events (like remote); or on native Linux in non-stop mode if
> -     there were only two threads in the inferior and the non-leader
> -     one is the one that execs (and nothing forces an update of the
> -     thread list up to here).  When debugging remotely, it's best to
> +     exit events (like remote) and nothing forces an update of the
> +     thread list up to here.  When debugging remotely, it's best to
>       avoid extra traffic, when possible, so avoid syncing the thread
>       list with the target, and instead go ahead and delete all threads
> -     of the process but one that reported the event.  Note this must
> +     of the process but the one that reported the event.  Note this must
>       be done before calling update_breakpoints_after_exec, as
>       otherwise clearing the threads' resources would reference stale
>       thread breakpoints -- it may have been one of these threads that
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 68816ddc999..90ac94440b8 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2001,6 +2001,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
>  	 thread execs, it changes its tid to the tgid, and the old
>  	 tgid thread might have not been resumed.  */
>        lp->resumed = 1;
> +
> +      /* All other LWPs are gone now.  We'll have received a thread
> +	 exit notification for all threads other the execing one.
> +	 That one, if it wasn't the leader, just silently changes its
> +	 tid to the tgid, and the previous leader vanishes.  Since
> +	 Linux 3.0, the former thread ID can be retrieved with
> +	 PTRACE_GETEVENTMSG, but since we support older kernels, don't
> +	 bother with it, and just walk the LWP list.  Even with
> +	 PTRACE_GETEVENTMSG, we'd still need to lookup the
> +	 corresponding LWP object, and it would be an extra ptrace
> +	 syscall, so this way may even be more efficient.  */
> +      for (lwp_info *other_lp : all_lwps_safe ())
> +	if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ())
> +	  exit_lwp (other_lp);
> +
>        return 0;
>      }
>  
> diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.exp b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> new file mode 100644
> index 00000000000..824dda349a6
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> @@ -0,0 +1,70 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test that after an exec of a non-leader thread, we don't leave the
> +# non-leader thread listed in internal thread lists, causing problems.
> +
> +standard_testfile .c -execd.c
> +
> +proc do_test { } {
> +    global srcdir subdir srcfile srcfile2 binfile testfile
> +    global decimal
> +
> +    # Compile main binary (the one that does the exec).
> +    if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \
> +	     executable {debug}] != "" } {
> +	return -1
> +    }

You can do:

    if {[build_executable "failed to build main executable" \
             $binfile $srcfile {debug pthread}] == -1} {
	return -1
    }

> +
> +    # Compile the second binary (the one that gets exec'd).
> +    if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \
> +	     executable {debug}] != "" } {
> +	return -1
> +    }

And:

    if {[build_executable "failed to build execd executable" \
             $binfile-execd $srcfile2 {debug}] == -1} {
	return -1
    }

I thought we were moving away from calling the gdb_compile* functions
directly.

Assuming the missing source files are added, this all looks great.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>

Thanks,
Andrew

> +
> +    clean_restart $binfile
> +
> +    if ![runto_main] {
> +	return
> +    }
> +
> +    gdb_test "catch exec" "Catchpoint $decimal \\(exec\\)"
> +
> +    gdb_test "continue" "Catchpoint $decimal .*" "continue until exec"
> +
> +    # Confirm we only have one thread in the thread list.
> +    gdb_test "info threads" "\\* 1\[ \t\]+\[^\r\n\]+.*"
> +
> +    if {[istarget *-*-linux*] && [gdb_is_target_native]} {
> +	# Confirm there's only one LWP in the list as well, and that
> +	# it is bound to thread 1.1.
> +	set inf_pid [get_inferior_pid]
> +	gdb_test_multiple "maint info linux-lwps" "" {
> +	    -wrap -re "Thread ID *\r\n$inf_pid\.$inf_pid\.0\[ \t\]+1\.1 *" {
> +		pass $gdb_test_name
> +	    }
> +	}
> +    }
> +
> +    # Test that GDB is able to kill the inferior.  This used to crash
> +    # on native Linux as GDB did not dispose of the pre-exec LWP for
> +    # the non-leader (and that LWP did not have a matching thread in
> +    # the core thread list).
> +    gdb_test "with confirm off -- kill" \
> +	"\\\[Inferior 1 (.*) killed\\\]" \
> +	"kill inferior"
> +}
> +
> +do_test
> -- 
> 2.36.0

next prev parent reply	other threads:[~2023-05-26 15:05 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12 20:30 [PATCH 00/31] Step over thread clone and thread exit Pedro Alves
2022-12-12 20:30 ` [PATCH 01/31] displaced step: pass down target_waitstatus instead of gdb_signal Pedro Alves
2023-02-03 10:44   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:07       ` Andrew Burgess
2023-03-22 21:29         ` Andrew Burgess
2023-03-23 15:15           ` Pedro Alves
2023-03-27 12:40             ` Andrew Burgess
2023-03-27 16:21               ` Pedro Alves
2022-12-12 20:30 ` [PATCH 02/31] linux-nat: introduce pending_status_str Pedro Alves
2023-02-03 12:00   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:19       ` Andrew Burgess
2023-03-27 18:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Pedro Alves
2023-03-21 14:50   ` Andrew Burgess
2023-04-04 13:57     ` Pedro Alves
2023-04-14 19:29       ` Pedro Alves
2023-05-26 15:04         ` Andrew Burgess [this message]
2023-11-13 14:04           ` Pedro Alves
2023-05-26 14:45       ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 04/31] Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED Pedro Alves
2023-02-04 15:38   ` Andrew Burgess
2023-03-10 17:16     ` Pedro Alves
2023-03-21 16:06       ` Andrew Burgess
2023-11-13 14:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 05/31] Support clone events in the remote protocol Pedro Alves
2023-03-22 15:46   ` Andrew Burgess
2023-11-13 14:05     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 06/31] Avoid duplicate QThreadEvents packets Pedro Alves
2023-05-26 15:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 07/31] enum_flags to_string Pedro Alves
2023-01-30 20:07   ` Simon Marchi
2022-12-12 20:30 ` [PATCH 08/31] Thread options & clone events (core + remote) Pedro Alves
2023-01-31 12:25   ` Lancelot SIX
2023-03-10 19:16     ` Pedro Alves
2023-06-06 13:29       ` Andrew Burgess
2023-11-13 14:07         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 09/31] Thread options & clone events (native Linux) Pedro Alves
2023-06-06 13:43   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 10/31] Thread options & clone events (Linux GDBserver) Pedro Alves
2023-06-06 14:12   ` Andrew Burgess
2023-11-13 14:07     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 11/31] gdbserver: Hide and don't detach pending clone children Pedro Alves
2023-06-07 16:10   ` Andrew Burgess
2023-11-13 14:08     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 12/31] Remove gdb/19675 kfails (displaced stepping + clone) Pedro Alves
2023-06-07 17:08   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 13/31] Add test for stepping over clone syscall Pedro Alves
2023-06-07 17:42   ` Andrew Burgess
2023-11-13 14:09     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 14/31] all-stop/synchronous RSP support thread-exit events Pedro Alves
2023-06-07 17:52   ` Andrew Burgess
2023-11-13 14:11     ` Pedro Alves
2023-12-15 18:15       ` Pedro Alves
2022-12-12 20:30 ` [PATCH 15/31] gdbserver/linux-low.cc: Ignore event_ptid if TARGET_WAITKIND_IGNORE Pedro Alves
2022-12-12 20:30 ` [PATCH 16/31] Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to core Pedro Alves
2023-06-08 12:27   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 17/31] Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exit Pedro Alves
2023-06-08 13:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 18/31] Implement GDB_THREAD_OPTION_EXIT support for Linux GDBserver Pedro Alves
2023-06-08 14:14   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 19/31] Implement GDB_THREAD_OPTION_EXIT support for native Linux Pedro Alves
2023-06-08 14:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 20/31] gdb: clear step over information on thread exit (PR gdb/27338) Pedro Alves
2023-06-08 15:29   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 21/31] stop_all_threads: (re-)enable async before waiting for stops Pedro Alves
2023-06-08 15:49   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 22/31] gdbserver: Queue no-resumed event after thread exit Pedro Alves
2023-06-08 18:16   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 23/31] Don't resume new threads if scheduler-locking is in effect Pedro Alves
2023-06-08 18:24   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 24/31] Report thread exit event for leader if reporting thread exit events Pedro Alves
2023-06-09 13:11   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 25/31] Ignore failure to read PC when resuming Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 26/31] gdb/testsuite/lib/my-syscalls.S: Refactor new SYSCALL macro Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 27/31] Testcases for stepping over thread exit syscall (PR gdb/27338) Pedro Alves
2023-06-12  9:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 28/31] Document remote clone events, and QThreadOptions packet Pedro Alves
2023-06-05 15:53   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2023-06-12 12:06   ` Andrew Burgess
2023-11-13 14:15     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 29/31] inferior::clear_thread_list always silent Pedro Alves
2023-06-12 12:20   ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 30/31] Centralize "[Thread ...exited]" notifications Pedro Alves
2023-02-04 16:05   ` Andrew Burgess
2023-03-10 17:21     ` Pedro Alves
2023-02-16 15:40   ` Andrew Burgess
2023-06-12 12:23     ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 31/31] Cancel execution command on thread exit, when stepping, nexting, etc Pedro Alves
2023-06-12 13:12   ` Andrew Burgess
2023-01-24 19:47 ` [PATCH v3 00/31] Step over thread clone and thread exit Pedro Alves
2023-11-13 14:24 ` [PATCH " Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm6n5e20.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).