From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event
Date: Fri, 26 May 2023 16:04:55 +0100 [thread overview]
Message-ID: <87pm6n5e20.fsf@redhat.com> (raw)
In-Reply-To: <5b80a2c3-3679-fb86-27f3-0dcc9c019562@palves.net>
Pedro Alves <pedro@palves.net> writes:
> Hi!
>
> On 2023-04-04 2:57 p.m., Pedro Alves wrote:
>> On 2023-03-21 2:50 p.m., Andrew Burgess wrote:
>>>
>>> I thought it was the second case, but I was so unsure that I tried the
>>> reproducer anyway. Just in case I'm wrong, the above example doesn't
>>> seem to fail prior to this commit.
>>
>> This surprised me, and when I tried it myself, I was even more surprised,
>> for I couldn't reproduce it either!
>>
>> But I figured it out.
>>
>> I'm usually using Ubuntu 22.04 for development nowadays, and in that system, indeed I can't
>> reproduce it. Right after the exec, GDB traps a load event for "libc.so.6", which leads to
>> gdb trying to open libthread_db for the post-exec inferior, and, it succeeds. When we load
>> libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps,
>> and then waits to see their stops. While doing this, GDB detects that the pre-exec stale
>> LWP is gone, and deletes it.
>>
>> The logs show:
>>
>> [linux-nat] linux_nat_wait_1: waitpid 1725529 received SIGTRAP - Trace/breakpoint trap (stopped)
>> [linux-nat] save_stop_reason: 1725529.1725529.0 stopped by software breakpoint
>> [linux-nat] linux_nat_wait_1: waitpid(-1, ...) returned 0, ERRNO-OK
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725658.0, not stopped
>> [linux-nat] resume_stopped_resumed_lwps: NOT resuming LWP 1725529.1725529.0, has pending status
>> [linux-nat] linux_nat_wait_1: trap ptid is 1725529.1725529.0.
>> [linux-nat] linux_nat_wait_1: exit
>> [linux-nat] stop_callback: kill 1725529.1725658.0 **<SIGSTOP>**
>> [linux-nat] stop_callback: lwp kill -1 No such process
>> [linux-nat] wait_lwp: 1725529.1725658.0 vanished.
>>
>> And the backtrace is:
>>
>> (top-gdb) bt
>> #0 wait_lwp (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2069
>> #1 0x0000555555aa8fbf in stop_wait_callback (lp=0x555556f37350) at ../../src/gdb/linux-nat.c:2375
>> #2 0x0000555555ab12b3 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (__closure=0x0, ecall=..., args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:326
>> #3 0x0000555555ab12e2 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:320
>> #4 0x0000555555ab0610 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffca90, args#0=0x555556f37350) at ../../src/gdb/../gdbsupport/function-view.h:289
>> #5 0x0000555555aa4c2d in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:867
>> #6 0x0000555555aa8a03 in linux_stop_and_wait_all_lwps () at ../../src/gdb/linux-nat.c:2229
>> #7 0x0000555555ac8525 in try_thread_db_load_1 (info=0x555556a66dd0) at ../../src/gdb/linux-thread-db.c:923
>> #8 0x0000555555ac89d5 in try_thread_db_load (library=0x5555560eca27 "libthread_db.so.1", check_auto_load_safe=false) at ../../src/gdb/linux-thread-db.c:1024
>> #9 0x0000555555ac8eda in try_thread_db_load_from_sdir () at ../../src/gdb/linux-thread-db.c:1108
>> #10 0x0000555555ac9278 in thread_db_load_search () at ../../src/gdb/linux-thread-db.c:1163
>> #11 0x0000555555ac9518 in thread_db_load () at ../../src/gdb/linux-thread-db.c:1225
>> #12 0x0000555555ac95e1 in check_for_thread_db () at ../../src/gdb/linux-thread-db.c:1268
>> #13 0x0000555555ac9657 in thread_db_new_objfile (objfile=0x555556943ed0) at ../../src/gdb/linux-thread-db.c:1297
>> #14 0x000055555569e2d2 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61
>> #15 0x000055555569c44a in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555567925d8: 0x555555ac95e8 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111
>> #16 0x0000555555699d69 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffce50: 0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:290
>> #17 0x0000555555b5f48b in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555567925d8, __args#0=0x555556943ed0) at /usr/include/c++/11/bits/std_function.h:590
>> #18 0x0000555555b5eba4 in gdb::observers::observable<objfile*>::notify (this=0x5555565b5680 <gdb::observers::new_objfile>, args#0=0x555556943ed0) at ../../src/gdb/../gdbsupport/observable.h:166
>> #19 0x0000555555cdd85b in symbol_file_add_with_addrs (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1131
>> #20 0x0000555555cdd9c5 in symbol_file_add_from_bfd (abfd=..., name=0x5555569794e0 "/lib/x86_64-linux-gnu/libc.so.6", add_flags=..., addrs=0x7fffffffd0c0, flags=..., parent=0x0) at ../../src/gdb/symfile.c:1167
>> #21 0x0000555555c9dd69 in solib_read_symbols (so=0x5555569792d0, flags=...) at ../../src/gdb/solib.c:730
>> #22 0x0000555555c9e7b7 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at ../../src/gdb/solib.c:1041
>> #23 0x0000555555c9f61d in handle_solib_event () at ../../src/gdb/solib.c:1315
>> #24 0x0000555555729c26 in bpstat_stop_status (aspace=0x555556606800, bp_addr=0x7ffff7fe7278, thread=0x555556816bd0, ws=..., stop_chain=0x0) at ../../src/gdb/breakpoint.c:5702
>> #25 0x0000555555a62e41 in handle_signal_stop (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6517
>> #26 0x0000555555a61479 in handle_inferior_event (ecs=0x7fffffffd670) at ../../src/gdb/infrun.c:6000
>> #27 0x0000555555a5c7b5 in fetch_inferior_event () at ../../src/gdb/infrun.c:4403
>> #28 0x0000555555a35b65 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:41
>> #29 0x0000555555aae0c9 in handle_target_event (error=0, client_data=0x0) at ../../src/gdb/linux-nat.c:4231
>>
>>
>> Now, when I try the same on a Fedora 32 machine, I see the GDB crash due to the stale
>> LWP still in the LWP list with no corresponding thread_info. On this
>> machine, glibc predates the changes that make it possible to use libthread_db with
>> non-threaded processes, so try_thread_db_load doesn't manage to open a connection
>> to libthread_db, and thus we don't end up in linux_stop_and_wait_all_lwps, and thus
>> the stale lwp is not deleted. And so a subsequent "kill" command crashes.
>>
>> I wrote that patch originally on an Ubuntu 20.04 machine (vs the Ubuntu 22.04 I have now),
>> and it must be that that version also predates the glibc change, and thus behaves like
>> this Fedora 32 box. You are very likely using a newer Fedora which has the glibc change.
>
> ...
>
>>> What are your thoughts on including this, or something like this with
>>> this commit? My patch, which applies on top of this commit, is included
>>> at the end of this email. Please feel free to take any changes that you
>>> feel add value.
>>
>> I'm totally fine with such a command, though the test I had added covers
>> as much as it would, as the "kill" command fails when the maint command
>> would fail, and passes when the maint command passes. But I'll incorporate
>> it.
>>
>
> I realized that my description of the problem above practically
> suggests a way to expose the crash everywhere -- just catch the exec
> event with "catch exec", so that the post-exec program doesn't even
> get to the libc.so.6 load event, and issue "kill" there, or use "maint info linux-lwps".
> So I've adjusted the patch to add a new testcase doing that. I've attached two
> patches, one adding your "maint info linux-lwps", now with NEWS/docs, and
> the updated version of the crash fix and testcase.
>
> WDYT?
>
> Pedro Alves
> From 450e0133fc884f027cce4ae65378ea5560f6464d Mon Sep 17 00:00:00 2001
> From: Andrew Burgess <aburgess@redhat.com>
> Date: Tue, 4 Apr 2023 14:50:35 +0100
> Subject: [PATCH 1/2] Add "maint info linux-lwps" command
>
> This adds a maintenance command that lets you list all the LWPs under
> control of the linux-nat target.
>
> For example:
>
> (gdb) maint info linux-lwps
> LWP Ptid Thread ID
> 560948.561047.0 None
> 560948.560948.0 1.1
>
> This shows that "560948.561047.0" LWP doesn't map to any thread_info
> object, which is bogus. We'll be using this in a testcase in a
> following patch.
>
> Co-Authored-By: Pedro Alves <pedro@palves.net>
> Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
> ---
> gdb/NEWS | 3 +++
> gdb/doc/gdb.texinfo | 4 ++++
> gdb/linux-nat.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d729aa24056..3747e7d52c1 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -78,6 +78,9 @@ maintenance info frame-unwinders
> maintenance wait-for-index-cache
> Wait until all pending writes to the index cache have completed.
>
> +maintenance info linux-lwps
> + List all LWPs under control of the linux-nat target.
> +
> set always-read-ctf on|off
> show always-read-ctf
> When off, CTF is only read if DWARF is not present. When on, CTF is
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 6c811b8be2e..398bbb88af6 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -40605,6 +40605,10 @@ module (@pxref{Disassembly In Python}), and will only be present after
> that module has been imported. To force the module to be imported do
> the following:
>
> +@kindex maint info linux-lwps
> +@item maint info linux-lwps
> +Print information about LWPs under control of the Linux native target.
> +
> @smallexample
> (@value{GDBP}) python import gdb.disassembler
> @end smallexample
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 944f23de01a..68816ddc999 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -4479,6 +4479,49 @@ current_lwp_ptid (void)
> return inferior_ptid;
> }
>
> +/* Implement 'maintenance info linux-lwps'. Displays some basic
> + information about all the current lwp_info objects. */
> +
> +static void
> +maintenance_info_lwps (const char *arg, int from_tty)
> +{
> + if (all_lwps ().size () == 0)
> + {
> + gdb_printf ("No Linux LWPs\n");
> + return;
> + }
> +
> + /* Start the width at 8 to match the column heading below, then
> + figure out the widest ptid string. We'll use this to build our
> + output table below. */
> + size_t ptid_width = 8;
> + for (lwp_info *lp : all_lwps ())
> + ptid_width = std::max (ptid_width, lp->ptid.to_string ().size ());
> +
> + /* Setup the table headers. */
> + struct ui_out *uiout = current_uiout;
> + ui_out_emit_table table_emitter (uiout, 2, -1, "linux-lwps");
> + uiout->table_header (ptid_width, ui_left, "lwp-ptid", _("LWP Ptid"));
> + uiout->table_header (9, ui_left, "thread-info", _("Thread ID"));
> + uiout->table_body ();
> +
> + /* Display one table row for each lwp_info. */
> + for (lwp_info *lp : all_lwps ())
> + {
> + ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry");
> +
> + struct thread_info *th = find_thread_ptid (linux_target, lp->ptid);
After recent changes this line becomes:
struct thread_info *th = linux_target->find_thread (lp->ptid);
> +
> + uiout->field_string ("lwp-ptid", lp->ptid.to_string ().c_str ());
> + if (th == nullptr)
> + uiout->field_string ("thread-info", "None");
> + else
> + uiout->field_string ("thread-info", print_full_thread_id (th));
> +
> + uiout->message ("\n");
> + }
> +}
> +
> void _initialize_linux_nat ();
> void
> _initialize_linux_nat ()
> @@ -4516,6 +4559,9 @@ Enables printf debugging output."),
> sigemptyset (&blocked_mask);
>
> lwp_lwpid_htab_create ();
> +
> + add_cmd ("linux-lwps", class_maintenance, maintenance_info_lwps,
> + _("List the Linux LWPS."), &maintenanceinfolist);
> }
> \f
>
>
> base-commit: 57573e54afb9f7ed957eec43dfd2830f2384c970
> prerequisite-patch-id: 3a896bfe4b7c66a2e3a6aa668c5ae8395e5d8a52
> --
> 2.36.0
>
> From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <pedro@palves.net>
> Date: Wed, 13 Jul 2022 17:16:38 +0100
> Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace
> exec event
>
> I noticed that on an Ubuntu 20.04 system, after a following patch
> ("Step over clone syscall w/ breakpoint,
> TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
> was passing cleanly, but still, we'd end up with four new unexpected
> GDB core dumps:
>
> === gdb Summary ===
>
> # of unexpected core files 4
> # of expected passes 48
>
> That said patch is making the pre-existing
> gdb.threads/step-over-exec.exp testcase (almost silently) expose a
> latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
>
> #1 - a non-leader thread execs
> #2 - the post-exec program stops somewhere
> #3 - you kill the inferior
>
> Instead of #3 directly, the testcase just returns, which ends up in
> gdb_exit, tearing down GDB, which kills the inferior, and is thus
> equivalent to #3 above.
>
> Vis:
>
> $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
> ...
> (top-gdb) r
> ...
> (gdb) b main
> ...
> (gdb) r
> ...
> Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
> 69 argv0 = argv[0];
> (gdb) c
> Continuing.
> [New Thread 0x7ffff7d89700 (LWP 2506975)]
> Other going in exec.
> Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
> process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>
> Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
> 28 foo ();
> (gdb) k
> ...
> Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
> 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
> 393 return m_suspend.waitstatus_pending_p;
> (top-gdb) bt
> #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
> #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
> #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
> #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
> #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
> #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
> #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
> #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
> #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
> ...
As I mentioned in my other message, this backtrace includes
kill_unfollowed_child_callback, which doesn't exist yet! I think that's
OK though, the text before the backtrace does make it clear that you saw
this problem only after applying a later patch.
>
> The root of the problem is that when a non-leader LWP execs, it just
> changes its tid to the tgid, replacing the pre-exec leader thread,
> becoming the new leader. There's no thread exit event for the execing
> thread. It's as if the old pre-exec LWP vanishes without trace. The
> ptrace man page says:
>
> "PTRACE_O_TRACEEXEC (since Linux 2.5.46)
> Stop the tracee at the next execve(2). A waitpid(2) by the
> tracer will return a status value such that
>
> status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
>
> If the execing thread is not a thread group leader, the thread
> ID is reset to thread group leader's ID before this stop.
> Since Linux 3.0, the former thread ID can be retrieved with
> PTRACE_GETEVENTMSG."
>
> When the core of GDB processes an exec events, it deletes all the
> threads of the inferior. But, that is too late -- deleting the thread
> does not delete the corresponding LWP, so we end leaving the pre-exec
> non-leader LWP stale in the LWP list. That's what leads to the crash
> above -- linux_nat_target::kill iterates over all LWPs, and after the
> patch in question, that code will look for the corresponding
> thread_info for each LWP. For the pre-exec non-leader LWP still
> listed, won't find one.
>
> This patch fixes it, by deleting the pre-exec non-leader LWP (and
> thread) from the LWP/thread lists as soon as we get an exec event out
> of ptrace.
>
> GDBserver does not need an equivalent fix, because it is already doing
> this, as side effect of mourning the pre-exec process, in
> gdbserver/linux-low.cc:
>
> else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
> {
> ...
> /* Delete the execing process and all its threads. */
> mourn (proc);
> switch_to_thread (nullptr);
>
>
> The crash with gdb.threads/step-over-exec.exp is not observable on
> newer systems, which postdate the glibc change to move "libpthread.so"
> internals to "libc.so.6", because right after the exec, GDB traps a
> load event for "libc.so.6", which leads to GDB trying to open
> libthread_db for the post-exec inferior, and, on such systems that
> succeeds. When we load libthread_db, we call
> linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
> lwps, and then waits to see their stops. While doing this, GDB
> detects that the pre-exec stale LWP is gone, and deletes it.
>
> If we use "catch exec" to stop right at the exec before the
> "libc.so.6" load event ever happens, and issue "kill" right there,
> then GDB crashes on newer systems as well. So instead of tweaking
> gdb.threads/step-over-exec.exp to cover the fix, add a new
> gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
Maybe it's worth mentioning that because the crash itself only happens
once a later patch is applied we use 'maint info linux-lwps' to reveal
the issue for now?
>
> Also tweak a comment in infrun.c:follow_exec referring to how
> linux-nat.c used to behave, as it would become stale otherwise.
>
> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
> ---
> gdb/infrun.c | 8 +--
> gdb/linux-nat.c | 15 ++++
> .../gdb.threads/threads-after-exec.exp | 70 +++++++++++++++++++
Oops, this diff is missing the two source files for this test (.c and
-execd.c). I was able to figure something out though so I could test
the rest of this patch :)
> 3 files changed, 88 insertions(+), 5 deletions(-)
> create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index abe49ae0f2f..93edc224622 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -1224,13 +1224,11 @@ follow_exec (ptid_t ptid, const char *exec_file_target)
> some other thread does the exec, and even if the main thread was
> stopped or already gone. We may still have non-leader threads of
> the process on our list. E.g., on targets that don't have thread
> - exit events (like remote); or on native Linux in non-stop mode if
> - there were only two threads in the inferior and the non-leader
> - one is the one that execs (and nothing forces an update of the
> - thread list up to here). When debugging remotely, it's best to
> + exit events (like remote) and nothing forces an update of the
> + thread list up to here. When debugging remotely, it's best to
> avoid extra traffic, when possible, so avoid syncing the thread
> list with the target, and instead go ahead and delete all threads
> - of the process but one that reported the event. Note this must
> + of the process but the one that reported the event. Note this must
> be done before calling update_breakpoints_after_exec, as
> otherwise clearing the threads' resources would reference stale
> thread breakpoints -- it may have been one of these threads that
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 68816ddc999..90ac94440b8 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2001,6 +2001,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
> thread execs, it changes its tid to the tgid, and the old
> tgid thread might have not been resumed. */
> lp->resumed = 1;
> +
> + /* All other LWPs are gone now. We'll have received a thread
> + exit notification for all threads other the execing one.
> + That one, if it wasn't the leader, just silently changes its
> + tid to the tgid, and the previous leader vanishes. Since
> + Linux 3.0, the former thread ID can be retrieved with
> + PTRACE_GETEVENTMSG, but since we support older kernels, don't
> + bother with it, and just walk the LWP list. Even with
> + PTRACE_GETEVENTMSG, we'd still need to lookup the
> + corresponding LWP object, and it would be an extra ptrace
> + syscall, so this way may even be more efficient. */
> + for (lwp_info *other_lp : all_lwps_safe ())
> + if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ())
> + exit_lwp (other_lp);
> +
> return 0;
> }
>
> diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.exp b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> new file mode 100644
> index 00000000000..824dda349a6
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/threads-after-exec.exp
> @@ -0,0 +1,70 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test that after an exec of a non-leader thread, we don't leave the
> +# non-leader thread listed in internal thread lists, causing problems.
> +
> +standard_testfile .c -execd.c
> +
> +proc do_test { } {
> + global srcdir subdir srcfile srcfile2 binfile testfile
> + global decimal
> +
> + # Compile main binary (the one that does the exec).
> + if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \
> + executable {debug}] != "" } {
> + return -1
> + }
You can do:
if {[build_executable "failed to build main executable" \
$binfile $srcfile {debug pthread}] == -1} {
return -1
}
> +
> + # Compile the second binary (the one that gets exec'd).
> + if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \
> + executable {debug}] != "" } {
> + return -1
> + }
And:
if {[build_executable "failed to build execd executable" \
$binfile-execd $srcfile2 {debug}] == -1} {
return -1
}
I thought we were moving away from calling the gdb_compile* functions
directly.
Assuming the missing source files are added, this all looks great.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Thanks,
Andrew
> +
> + clean_restart $binfile
> +
> + if ![runto_main] {
> + return
> + }
> +
> + gdb_test "catch exec" "Catchpoint $decimal \\(exec\\)"
> +
> + gdb_test "continue" "Catchpoint $decimal .*" "continue until exec"
> +
> + # Confirm we only have one thread in the thread list.
> + gdb_test "info threads" "\\* 1\[ \t\]+\[^\r\n\]+.*"
> +
> + if {[istarget *-*-linux*] && [gdb_is_target_native]} {
> + # Confirm there's only one LWP in the list as well, and that
> + # it is bound to thread 1.1.
> + set inf_pid [get_inferior_pid]
> + gdb_test_multiple "maint info linux-lwps" "" {
> + -wrap -re "Thread ID *\r\n$inf_pid\.$inf_pid\.0\[ \t\]+1\.1 *" {
> + pass $gdb_test_name
> + }
> + }
> + }
> +
> + # Test that GDB is able to kill the inferior. This used to crash
> + # on native Linux as GDB did not dispose of the pre-exec LWP for
> + # the non-leader (and that LWP did not have a matching thread in
> + # the core thread list).
> + gdb_test "with confirm off -- kill" \
> + "\\\[Inferior 1 (.*) killed\\\]" \
> + "kill inferior"
> +}
> +
> +do_test
> --
> 2.36.0
next prev parent reply other threads:[~2023-05-26 15:05 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-12 20:30 [PATCH 00/31] Step over thread clone and thread exit Pedro Alves
2022-12-12 20:30 ` [PATCH 01/31] displaced step: pass down target_waitstatus instead of gdb_signal Pedro Alves
2023-02-03 10:44 ` Andrew Burgess
2023-03-10 17:15 ` Pedro Alves
2023-03-16 16:07 ` Andrew Burgess
2023-03-22 21:29 ` Andrew Burgess
2023-03-23 15:15 ` Pedro Alves
2023-03-27 12:40 ` Andrew Burgess
2023-03-27 16:21 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 02/31] linux-nat: introduce pending_status_str Pedro Alves
2023-02-03 12:00 ` Andrew Burgess
2023-03-10 17:15 ` Pedro Alves
2023-03-16 16:19 ` Andrew Burgess
2023-03-27 18:05 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Pedro Alves
2023-03-21 14:50 ` Andrew Burgess
2023-04-04 13:57 ` Pedro Alves
2023-04-14 19:29 ` Pedro Alves
2023-05-26 15:04 ` Andrew Burgess [this message]
2023-11-13 14:04 ` Pedro Alves
2023-05-26 14:45 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 04/31] Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED Pedro Alves
2023-02-04 15:38 ` Andrew Burgess
2023-03-10 17:16 ` Pedro Alves
2023-03-21 16:06 ` Andrew Burgess
2023-11-13 14:05 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 05/31] Support clone events in the remote protocol Pedro Alves
2023-03-22 15:46 ` Andrew Burgess
2023-11-13 14:05 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 06/31] Avoid duplicate QThreadEvents packets Pedro Alves
2023-05-26 15:53 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 07/31] enum_flags to_string Pedro Alves
2023-01-30 20:07 ` Simon Marchi
2022-12-12 20:30 ` [PATCH 08/31] Thread options & clone events (core + remote) Pedro Alves
2023-01-31 12:25 ` Lancelot SIX
2023-03-10 19:16 ` Pedro Alves
2023-06-06 13:29 ` Andrew Burgess
2023-11-13 14:07 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 09/31] Thread options & clone events (native Linux) Pedro Alves
2023-06-06 13:43 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 10/31] Thread options & clone events (Linux GDBserver) Pedro Alves
2023-06-06 14:12 ` Andrew Burgess
2023-11-13 14:07 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 11/31] gdbserver: Hide and don't detach pending clone children Pedro Alves
2023-06-07 16:10 ` Andrew Burgess
2023-11-13 14:08 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 12/31] Remove gdb/19675 kfails (displaced stepping + clone) Pedro Alves
2023-06-07 17:08 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 13/31] Add test for stepping over clone syscall Pedro Alves
2023-06-07 17:42 ` Andrew Burgess
2023-11-13 14:09 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 14/31] all-stop/synchronous RSP support thread-exit events Pedro Alves
2023-06-07 17:52 ` Andrew Burgess
2023-11-13 14:11 ` Pedro Alves
2023-12-15 18:15 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 15/31] gdbserver/linux-low.cc: Ignore event_ptid if TARGET_WAITKIND_IGNORE Pedro Alves
2022-12-12 20:30 ` [PATCH 16/31] Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to core Pedro Alves
2023-06-08 12:27 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 17/31] Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exit Pedro Alves
2023-06-08 13:17 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 18/31] Implement GDB_THREAD_OPTION_EXIT support for Linux GDBserver Pedro Alves
2023-06-08 14:14 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 19/31] Implement GDB_THREAD_OPTION_EXIT support for native Linux Pedro Alves
2023-06-08 14:17 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 20/31] gdb: clear step over information on thread exit (PR gdb/27338) Pedro Alves
2023-06-08 15:29 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 21/31] stop_all_threads: (re-)enable async before waiting for stops Pedro Alves
2023-06-08 15:49 ` Andrew Burgess
2023-11-13 14:12 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 22/31] gdbserver: Queue no-resumed event after thread exit Pedro Alves
2023-06-08 18:16 ` Andrew Burgess
2023-11-13 14:12 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 23/31] Don't resume new threads if scheduler-locking is in effect Pedro Alves
2023-06-08 18:24 ` Andrew Burgess
2023-11-13 14:12 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 24/31] Report thread exit event for leader if reporting thread exit events Pedro Alves
2023-06-09 13:11 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 25/31] Ignore failure to read PC when resuming Pedro Alves
2023-06-10 10:33 ` Andrew Burgess
2023-11-13 14:13 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 26/31] gdb/testsuite/lib/my-syscalls.S: Refactor new SYSCALL macro Pedro Alves
2023-06-10 10:33 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 27/31] Testcases for stepping over thread exit syscall (PR gdb/27338) Pedro Alves
2023-06-12 9:53 ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 28/31] Document remote clone events, and QThreadOptions packet Pedro Alves
2023-06-05 15:53 ` Andrew Burgess
2023-11-13 14:13 ` Pedro Alves
2023-06-12 12:06 ` Andrew Burgess
2023-11-13 14:15 ` Pedro Alves
2022-12-12 20:30 ` [PATCH 29/31] inferior::clear_thread_list always silent Pedro Alves
2023-06-12 12:20 ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 30/31] Centralize "[Thread ...exited]" notifications Pedro Alves
2023-02-04 16:05 ` Andrew Burgess
2023-03-10 17:21 ` Pedro Alves
2023-02-16 15:40 ` Andrew Burgess
2023-06-12 12:23 ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 31/31] Cancel execution command on thread exit, when stepping, nexting, etc Pedro Alves
2023-06-12 13:12 ` Andrew Burgess
2023-01-24 19:47 ` [PATCH v3 00/31] Step over thread clone and thread exit Pedro Alves
2023-11-13 14:24 ` [PATCH " Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pm6n5e20.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=pedro@palves.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).