From: Tom de Vries <tdevries@suse.de>
To: gdb-patches@sourceware.org
Subject: [RFC 1/3] [gdb/dap] Fix exit race
Date: Wed, 7 Feb 2024 10:02:22 +0100 [thread overview]
Message-ID: <20240207090224.27521-2-tdevries@suse.de> (raw)
In-Reply-To: <20240207090224.27521-1-tdevries@suse.de>
When running test-case gdb.dap/eof.exp, we're likely to get a coredump due to
a segfault in new_threadstate.
At the point of the core dump, the gdb main thread looks like:
...
(gdb) bt
#0 0x0000fffee30d2280 in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x0000fffee3085800 [PAC] in raise () from /lib64/libc.so.6
#2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11)
at gdb/event-top.c:926
#3 0x00000000007b0470 in handle_sigsegv (sig=11)
at gdb/event-top.c:976
#4 <signal handler called>
#5 0x0000fffee3a4db14 in new_threadstate () from /lib64/libpython3.12.so.1.0
#6 0x0000fffee3ab0548 [PAC] in PyGILState_Ensure () from /lib64/libpython3.12.so.1.0
#7 0x0000000000a6d034 [PAC] in gdbpy_gil::gdbpy_gil (this=0xffffcb279738)
at gdb/python/python-internal.h:787
#8 0x0000000000ab87ac in gdbpy_event::~gdbpy_event (this=0xfffea8001ee0,
__in_chrg=<optimized out>) at gdb/python/python.c:1051
#9 0x0000000000ab9460 in std::_Function_base::_Base_manager<...>::_M_destroy
(__victim=...) at /usr/include/c++/13/bits/std_function.h:175
#10 0x0000000000ab92dc in std::_Function_base::_Base_manager<...>::_M_manager
(__dest=..., __source=..., __op=std::__destroy_functor)
at /usr/include/c++/13/bits/std_function.h:203
#11 0x0000000000ab8f14 in std::_Function_handler<...>::_M_manager(...) (...)
at /usr/include/c++/13/bits/std_function.h:282
#12 0x000000000042dd9c in std::_Function_base::~_Function_base (this=0xfffea8001c10,
__in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:244
#13 0x000000000042e654 in std::function<void ()>::~function() (this=0xfffea8001c10,
__in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:334
#14 0x0000000000b68e60 in std::_Destroy<std::function<void ()> >(...) (...)
at /usr/include/c++/13/bits/stl_construct.h:151
#15 0x0000000000b68cd0 in std::_Destroy_aux<false>::__destroy<...>(...) (...)
at /usr/include/c++/13/bits/stl_construct.h:163
#16 0x0000000000b689d8 in std::_Destroy<...>(...) (...)
at /usr/include/c++/13/bits/stl_construct.h:196
#17 0x0000000000b68414 in std::_Destroy<...>(...) (...)
at /usr/include/c++/13/bits/alloc_traits.h:948
#18 std::vector<...>::~vector() (this=0x2a183c8 <runnables>)
at /usr/include/c++/13/bits/stl_vector.h:732
#19 0x0000fffee3088370 in __run_exit_handlers () from /lib64/libc.so.6
#20 0x0000fffee3088450 [PAC] in exit () from /lib64/libc.so.6
#21 0x0000000000c95600 [PAC] in quit_force (exit_arg=0x0, from_tty=0)
at gdb/top.c:1822
#22 0x0000000000609140 in quit_command (args=0x0, from_tty=0)
at gdb/cli/cli-cmds.c:508
#23 0x0000000000c926a4 in quit_cover () at gdb/top.c:300
#24 0x00000000007b09d4 in async_disconnect (arg=0x0)
at gdb/event-top.c:1230
#25 0x0000000000548acc in invoke_async_signal_handlers ()
at gdb/async-event.c:234
#26 0x000000000157d2d4 in gdb_do_one_event (mstimeout=-1)
at gdbsupport/event-loop.cc:199
#27 0x0000000000943a84 in start_event_loop () at gdb/main.c:401
#28 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465
#29 0x000000000094567c in captured_main (data=0xffffcb279d08)
at gdb/main.c:1335
#30 0x0000000000945700 in gdb_main (args=0xffffcb279d08)
at gdb/main.c:1354
#31 0x0000000000423ab4 in main (argc=14, argv=0xffffcb279e98)
at gdb/gdb.c:39
...
The direct cause of the segfault is calling PyGILState_Ensure after
calling Py_Finalize.
AFAICT the problem is a race between the gdb main thread and DAP's JSON writer
thread.
On one side, we have the following events:
- DAP's JSON reader thread reads an EOF
- it lets DAP's JSON writer thread known by writing None into its queue
- DAP's JSON writer thread sees the None in its queue, and calls
send_gdb("quit")
- a corresponding gdbpy_event is deposited in the runnables vector, to be
run by the gdb main thread
On the other side, we have the following events:
- the gdb main thread receives a SIGHUP
- the corresponding handler calls quit_force, which calls do_final_cleanups
- one of the final cleanups is finalize_python, which calls Py_Finalize
- quit_force calls exit, which triggers the exit handlers
- one of the exit handlers is the destructor of the runnables vector
- destruction of the vector triggers destruction of the remaining element
- the remaining element is a gdbpy_event, and the destructor (indirectly)
calls PyGILState_Ensure
My first attempt at fixing this was to write a finalize_runnables and call it
from quit_force, similar to finalize_values, to ensure that the gdbpy_event
destructor is called before Py_Finalize. However, I still ran into the same
problem due to the gdbpy_event being posted after finalize_runnables was
called. I managed to handle this by ignoring run_on_main_thread after
finalize_runnables, but it made me wonder if there's a better way.
Then I tried to simply remove send_gdb("quit"), and that worked as well, and
caused no regressions. So, either this is the easiest way to address this, or
we need to add a test-case that regresses when we remove it.
This RFC uses the latter approach. Perhaps the former is better, perhaps
something else is needed, I'm not sure.
Tested on aarch64-linux.
PR dap/31306
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31306
---
gdb/python/lib/gdb/dap/io.py | 1 -
1 file changed, 1 deletion(-)
diff --git a/gdb/python/lib/gdb/dap/io.py b/gdb/python/lib/gdb/dap/io.py
index 5149edae977..4edd504c727 100644
--- a/gdb/python/lib/gdb/dap/io.py
+++ b/gdb/python/lib/gdb/dap/io.py
@@ -68,7 +68,6 @@ def start_json_writer(stream, queue):
# This is an exit request. The stream is already
# flushed, so all that's left to do is request an
# exit.
- send_gdb("quit")
break
obj["seq"] = seq
seq = seq + 1
--
2.35.3
next prev parent reply other threads:[~2024-02-07 9:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-07 9:02 [RFC 0/3] [gdb/dap] Fix issues triggered by gdb.dap/eof.exp Tom de Vries
2024-02-07 9:02 ` Tom de Vries [this message]
2024-02-07 16:01 ` [RFC 1/3] [gdb/dap] Fix exit race Tom Tromey
2024-02-13 15:04 ` Tom de Vries
2024-02-13 18:04 ` Tom Tromey
2024-02-13 18:11 ` Tom Tromey
2024-02-14 15:31 ` Tom de Vries
2024-02-14 15:34 ` Tom Tromey
2024-02-14 15:53 ` Tom de Vries
2024-02-14 16:18 ` Tom Tromey
2024-02-14 17:16 ` Tom de Vries
2024-02-07 9:02 ` [RFC 2/3] [gdb/dap] Catch and log exceptions in dap threads Tom de Vries
2024-02-07 15:52 ` Tom Tromey
2024-02-12 15:15 ` Tom de Vries
2024-02-12 17:35 ` Tom Tromey
2024-02-07 9:02 ` [RFC 3/3] [gdb/dap] Ignore OSError on stream.flush in JSON writer Tom de Vries
2024-02-07 10:29 ` Tom de Vries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240207090224.27521-2-tdevries@suse.de \
--to=tdevries@suse.de \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).