From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from barracuda.ebox.ca (barracuda.ebox.ca [96.127.255.19]) by sourceware.org (Postfix) with ESMTPS id 45FF83858430 for ; Sun, 24 Apr 2022 03:20:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 45FF83858430 X-ASG-Debug-ID: 1650770452-0c856e06abca5fb0001-fS2M51 Received: from smtp.ebox.ca (smtp.ebox.ca [96.127.255.82]) by barracuda.ebox.ca with ESMTP id vItlb1m0fLlIjd8B (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 23 Apr 2022 23:20:52 -0400 (EDT) X-Barracuda-Envelope-From: simon.marchi@polymtl.ca X-Barracuda-RBL-Trusted-Forwarder: 96.127.255.82 Received: from simark.localdomain (192-222-157-6.qc.cable.ebox.net [192.222.157.6]) by smtp.ebox.ca (Postfix) with ESMTP id 0FAF3441D65; Sat, 23 Apr 2022 23:20:52 -0400 (EDT) From: Simon Marchi X-Barracuda-RBL-IP: 192.222.157.6 X-Barracuda-Effective-Source-IP: 192-222-157-6.qc.cable.ebox.net[192.222.157.6] X-Barracuda-Apparent-Source-IP: 192.222.157.6 To: gdb-patches@sourceware.org Subject: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Date: Sat, 23 Apr 2022 23:20:49 -0400 X-ASG-Orig-Subj: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Message-Id: <20220424032049.1021263-3-simon.marchi@polymtl.ca> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220424032049.1021263-1-simon.marchi@polymtl.ca> References: <20220424032049.1021263-1-simon.marchi@polymtl.ca> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Barracuda-Connect: smtp.ebox.ca[96.127.255.82] X-Barracuda-Start-Time: 1650770452 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://96.127.255.19:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ebox.ca X-Barracuda-Scan-Msg-Size: 11056 X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.50 X-Barracuda-Spam-Status: No, SCORE=0.50 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=8.0 tests=WEIRD_PORT X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.97556 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 WEIRD_PORT URI: Uses non-standard port number for HTTP X-Spam-Status: No, score=-3612.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_QUARANTINE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Apr 2022 03:21:01 -0000 Commit 152a17495663 ("gdb: prune inferiors at end of fetch_inferior_event, fix intermittent failure of gdb.threads/fork-plus-threads.exp") broke gdb.base/vfork-follow-parent.exp with the native-gdbserver board: (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 print unblock_parent = 1^M $1 = 1^M (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 continue^M Continuing.^M terminate called after throwing an instance of 'gdb_exception_error'^M I can manually reproduce the issue by running (with gdbserver running on port 1234) with the following command (just the commands that the test does as a one liner): $ ./gdb -q --data-directory=data-directory \ testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \ -ex "tar rem :1234" \ -ex "b main" \ -ex c \ -ex "d 1" \ -ex "set displaced-stepping off" \ -ex "b *0x7ffff7d7ac5a if main == 0" -ex "set detach-on-fork off" -ex "set follow-fork-mode child" -ex c -ex "inferior 1" -ex "b marker" -ex c ... where 0x7ffff7d7ac5a is the exact address of the vfork syscall (which can be found by looking at gdb.log). The important part of the above is that a vfork syscall creates inferior 2, then inferior 2 executes until exit, then we switch back to inferior 1 and try to resume it. The uncaught exception happens here: #4 0x00005596969d81a9 in error (fmt=0x559692da9e40 "Cannot execute this command while the target is running.\nUse the \"interrupt\" command to stop the target\nand then try again.") at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:43 #5 0x0000559695af6f66 in remote_target::putpkt_binary (this=0x617000038080, buf=0x559692da4380 "qSymbol::", cnt=9) at /home/simark/src/binutils-gdb/gdb/remote.c:9560 #6 0x0000559695af6aaf in remote_target::putpkt (this=0x617000038080, buf=0x559692da4380 "qSymbol::") at /home/simark/src/binutils-gdb/gdb/remote.c:9518 #7 0x0000559695ab50dc in remote_target::remote_check_symbols (this=0x617000038080) at /home/simark/src/binutils-gdb/gdb/remote.c:5141 #8 0x0000559695b3cccf in remote_new_objfile (objfile=0x0) at /home/simark/src/binutils-gdb/gdb/remote.c:14600 #9 0x0000559693bc52a9 in std::__invoke_impl (__f=@0x61b0000167f8: 0x559695b3cb1d ) at /usr/include/c++/11.2.0/bits/invoke.h:61 #10 0x0000559693bb2848 in std::__invoke_r (__fn=@0x61b0000167f8: 0x559695b3cb1d ) at /usr/include/c++/11.2.0/bits/invoke.h:111 #11 0x0000559693b8dddf in std::_Function_handler::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7ffe0bae0590: 0x0) at /usr/include/c++/11.2.0/bits/std_function.h:291 #12 0x00005596956374b2 in std::function::operator()(objfile*) const (this=0x61b0000167f8, __args#0=0x0) at /usr/include/c++/11.2.0/bits/std_function.h:560 #13 0x0000559695633c64 in gdb::observers::observable::notify (this=0x55969ef5c480 , args#0=0x0) at /home/simark/src/binutils-gdb/gdb/../gdbsupport/observable.h:150 #14 0x0000559695df6cc2 in clear_symtab_users (add_flags=...) at /home/simark/src/binutils-gdb/gdb/symfile.c:2873 #15 0x000055969574c263 in program_space::~program_space (this=0x6120000c8a40, __in_chrg=) at /home/simark/src/binutils-gdb/gdb/progspace.c:154 #16 0x0000559694fc086b in delete_inferior (inf=0x61700003bf80) at /home/simark/src/binutils-gdb/gdb/inferior.c:205 #17 0x0000559694fc341f in prune_inferiors () at /home/simark/src/binutils-gdb/gdb/inferior.c:390 #18 0x0000559695017ada in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4293 #19 0x0000559694f629e6 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:41 #20 0x0000559695b3b0e3 in remote_async_serial_handler (scb=0x6250001ef100, context=0x6170000380a8) at /home/simark/src/binutils-gdb/gdb/remote.c:14466 #21 0x0000559695c59eb7 in run_async_handler_and_reschedule (scb=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:138 #22 0x0000559695c5a42a in fd_event (error=0, context=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:189 #23 0x00005596969d9ebf in handle_file_event (file_ptr=0x60700005af40, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:574 #24 0x00005596969da7fa in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:700 #25 0x00005596969d8539 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212 If I enable "set debug infrun" just before the last continue, we see: (gdb) continue Continuing. [infrun] clear_proceed_status_thread: 965604.965604.0 [infrun] proceed: enter [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT [infrun] scoped_disable_commit_resumed: reason=proceeding [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0 [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=proceeding [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote [infrun] proceed: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at #1 [infrun] random_pending_event_thread: None found. [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 965604.965604.0 [Thread 965604.965604], [infrun] print_target_wait_results: status->kind = VFORK_DONE [infrun] handle_inferior_event: status->kind = VFORK_DONE [infrun] context_switch: Switching context from 0.0.0 to 965604.965604.0 [infrun] handle_vfork_done: not waiting for a vfork-done event [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0 [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote terminate called after throwing an instance of 'gdb_exception_error' What happens is: - After doing the "continue" on inferior 1, the remote target gives us a VFORK_DONE event. The core ignores it and resumes inferior 1. - Since prune_inferiors is now called after each handled event, in fetch_inferior_event, it is called after we handled that VFORK_DONE event and resumed inferior 1. - Inferior 2 is pruned, which (see backtrace above) causes its program space to be deleted, which clears the symtabs for that program space, which calls the new_objfile observable and remote_new_objfile observer (with a nullptr objfile, to indicate that the previously loaded symbols have been discarded), which calls remote_check_symbols. remote_check_symbols is the function that sends the qSymbol packet, to let the remote side ask for symbol addresses. The problem is that the remote target is working in all-stop / sync mode and is currently resumed. It has sent a vCont packet to resume the target and is waiting for a stop reply. It can't send any packets in the mean time. That causes the exception to be thrown. This wasn't a problem before, when prune_inferiors was called in normal_stop, because it was always called at a time the target was not resumed. I realized there are simpler ways to trigger this problem. You only need to load a symbol file (which triggers remote_check_symbols) while sync execution is ongoing: $ ./gdb -q --data-directory=data-directory -nx a.out \ -ex "tar ext :1234" \ -ex "tb main" \ -ex c \ -ex "c&" \ -ex "add-inferior" \ -ex "inferior 2" \ -ex "file /bin/ls" * aborts because of the same exception * My initial fix for it was to return early from new_objfile if objfile is nullptr, since that means that we removed some or all symbols. I don't think that's a good idea, because it might be important to send qSymbol after removing symbols. This could help a remote target realize that the symbols it used previously are now gone. I don't think that GDBserver uses that, but I think it could be used like that in theory. It also wouldn't help with that last example, since objfile wouldn't be nullptr. The proposed fix is instead to return early from remote_check_symbols if rs->waiting_for_stop_reply is set. If I understand correctly, it's not really necessary to use the more complete check that putpkt_binary uses: if (!target_is_non_stop_p () && target_is_async_p () && rs->waiting_for_stop_reply) ... because if rs->waiting_for_stop_reply is set, it means we're using the all-stop / sync mode. With this patch, I see the testsuite going back to the same state as before 152a17495663. Note that when I try my "load symbol file while background execution" example with this patch applied, GDB still crashes with the same uncaught exception, but it's another method that causes it (remote_hostio_send_command), meaning we are getting further in the execution. But it would be out of the scope of this patch to fix that. Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c --- gdb/remote.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/gdb/remote.c b/gdb/remote.c index 562cc586f2bf..a104f4a57a80 100644 --- a/gdb/remote.c +++ b/gdb/remote.c @@ -5124,6 +5124,10 @@ remote_target::remote_check_symbols () if (!target_has_execution ()) return; + remote_state *rs = get_remote_state (); + if (rs->waiting_for_stop_reply) + return; + if (packet_support (PACKET_qSymbol) == PACKET_DISABLE) return; -- 2.35.2