public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix regressions caused by prune_threads patch
@ 2022-04-24  3:20 Simon Marchi
  2022-04-24  3:20 ` [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled Simon Marchi
  2022-04-24  3:20 ` [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Simon Marchi
  0 siblings, 2 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-24  3:20 UTC (permalink / raw)
  To: gdb-patches

Fix regressions caused by 152a1749566 ("gdb: prune inferiors at end of
fetch_inferior_event, fix intermittent failure of
gdb.threads/fork-plus-threads.exp").  They are actually two pre-existing
issues in the code that are now uncovered.

Simon Marchi (2):
  gdb: make fetch_inferior_event restore thread if exited or signalled
  gdb/remote: return early from remote_check_symbols if waiting for stop
    reply

 gdb/infrun.c | 10 +++++++---
 gdb/remote.c |  4 ++++
 2 files changed, 11 insertions(+), 3 deletions(-)


base-commit: 69be4d89e30e0a6f0476ede2e34748330cd72445
-- 
2.35.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled
  2022-04-24  3:20 [PATCH 0/2] Fix regressions caused by prune_threads patch Simon Marchi
@ 2022-04-24  3:20 ` Simon Marchi
  2022-04-25 17:21   ` Andrew Burgess
  2022-04-29 12:37   ` Pedro Alves
  2022-04-24  3:20 ` [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Simon Marchi
  1 sibling, 2 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-24  3:20 UTC (permalink / raw)
  To: gdb-patches

Commit 152a1749566 ("gdb: prune inferiors at end of
fetch_inferior_event, fix intermittent failure of
gdb.threads/fork-plus-threads.exp") introduced some follow-fork-related
test failures, such as:

    info inferiors^M
      Num  Description       Connection           Executable        ^M
    * 1    process 634972    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
      2    process 634975    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
    (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: info inferiors
    inferior 2^M
    [Switching to inferior 2 [process 634975] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]^M
    [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 634975))]^M
    #0  0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6^M
    (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: inferior 2
    continue^M
    Continuing.^M
    [Inferior 2 (process 634975) exited normally]^M
    [Switching to Thread 0x7ffff7c9a740 (LWP 634972)]^M
    (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: continue until exit at continue unfollowed inferior to end
    break callee^M
    Breakpoint 2 at 0x555555555160: file /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c, line 9.^M
    (gdb) FAIL: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: break callee

What happens here is:

 - inferior 2 is selected
 - we continue, leading to inferior 2's exit
 - we set breakpoint, expect 2 locations, but only one location is
   resolved

Reading between the lines, we understand that inferior 2 got pruned,
when it shouldn't have been.

The issue can be reproduced by hand with:

    $ ./gdb -q --data-directory=data-directory testsuite/outputs/gdb.base/foll-fork/foll-fork -ex "set detach-on-fork off" -ex start -ex "next 2" -ex "inferior 2" -ex "set debug infrun"
    ...
    Temporary breakpoint 1, main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:14
    14        int  v = 5;
    [New inferior 2 (process 637627)]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".
    17        if (pid == 0) /* set breakpoint here */
    [Switching to inferior 2 [process 637627] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]
    [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 637627))]
    #0  0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6
    (gdb) continue
    Continuing.
    [infrun] clear_proceed_status_thread: 637627.637627.0
    [infrun] proceed: enter
      [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
      [infrun] scoped_disable_commit_resumed: reason=proceeding
      [infrun] start_step_over: enter
        [infrun] start_step_over: stealing global queue of threads to step, length = 0
        [infrun] operator(): step-over queue now empty
      [infrun] start_step_over: exit
      [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
        [infrun] proceed: resuming 637627.637627.0
        [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [637627.637627.0] at 0x7ffff7d7abf7
        [infrun] do_target_resume: resume_ptid=637627.637627.0, step=0, sig=GDB_SIGNAL_0
        [infrun] infrun_async: enable=1
        [infrun] prepare_to_wait: prepare_to_wait
      [infrun] proceed: end: resuming threads, all-stop-on-top-of-non-stop
      [infrun] reset: reason=proceeding
      [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target native
      [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
      [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
    [infrun] proceed: exit
    [infrun] fetch_inferior_event: enter
      [infrun] scoped_disable_commit_resumed: reason=handling event
      [infrun] do_target_wait: Found 2 inferiors, starting at #1
      [infrun] random_pending_event_thread: None found.
      [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
      [infrun] print_target_wait_results:   637627.637627.0 [process 637627],
      [infrun] print_target_wait_results:   status->kind = EXITED, exit_status = 0
      [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0
    [Inferior 2 (process 637627) exited normally]
      [infrun] stop_waiting: stop_waiting
      [infrun] stop_all_threads: start: reason=presenting stop to user in all-stop, inf=-1
        [infrun] stop_all_threads: pass=0, iterations=0
        [infrun] stop_all_threads:   637624.637624.0 not executing
        [infrun] stop_all_threads: pass=1, iterations=1
        [infrun] stop_all_threads:   637624.637624.0 not executing
        [infrun] stop_all_threads: done
      [infrun] stop_all_threads: end: reason=presenting stop to user in all-stop, inf=-1
    [Switching to Thread 0x7ffff7c9a740 (LWP 637624)]
      [infrun] infrun_async: enable=0
      [infrun] reset: reason=handling event
      [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
    (gdb) [infrun] fetch_inferior_event: exit
    (gdb) info inferiors
      Num  Description       Connection           Executable
    * 1    process 637624    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork
    (gdb) i th
      Id   Target Id                                      Frame
    * 1    Thread 0x7ffff7c9a740 (LWP 637624) "foll-fork" main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:17

After handling the EXITED event for inferior 2, inferior 2 should have
stayed the current inferior, which should have prevented it from getting
pruned.  When debugging, we find that when getting at the
prune_inferiors call, the current inferior is inferior 1.  Further
debugging shows that prior to the call to
clean_up_just_stopped_threads_fsms, the current inferior is inferior 2,
and after, it's inferior 1.  Then, back in fetch_inferior_event, the
restore_thread object is disabled, due to:

	    /* If we got a TARGET_WAITKIND_NO_RESUMED event, then the
	       previously selected thread is gone.  We have two
	       choices - switch to no thread selected, or restore the
	       previously selected thread (now exited).  We chose the
	       later, just because that's what GDB used to do.  After
	       this, "info threads" says "The current thread <Thread
	       ID 2> has terminated." instead of "No thread
	       selected.".  */
	    if (!non_stop
		&& cmd_done
		&& ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED)
	      restore_thread.dont_restore ();

So in the end, inferior 1 stays current, and inferior 2 gets wrongfully
pruned.

I'd say clean_up_just_stopped_threads_fsms is the culprit here.  It
actually attempts to restore the event_thread to be current at the end,
after the loop (I presume the current thread on entry is always supposed
to be the event thread).  But in this case, the event is of kind EXITED,
and ecs->event_thread is not set, so the current inferior isn't
restored.

Fix that by using scoped_restore_current_thread.  If there is no current
thread, scoped_restore_current_thread will still restore the current
inferior, and that's what we want.

Random note: the thread_info object for inferior 2's thread is never
freed.  It is held (by refcount) by the restore_thread object in
fetch_inferior_event, while the inferior's thread list gets cleared, in
the exit event processing.  When the refcount reaches 0 (when the
restore_thread object is destroyed), there's nothing that actually
deletes the thread_info object.  And I think that nothing in GDB points
to it anymore, so it leaks.  I don't want to fix that in this patch, but
thought it would be good to mention it, in case somebody has an idea for
how to fix that.

Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8
---
 gdb/infrun.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4e7ca803c798..6dccb7a2ff2a 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4052,12 +4052,19 @@ reinstall_readline_callback_handler_cleanup ()
 static void
 clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
 {
+  /* The first clean_up call below assumes the event thread is the current
+     one.  */
+  if (ecs->event_thread != nullptr)
+    gdb_assert (ecs->event_thread == inferior_thread ());
+
   if (ecs->event_thread != nullptr
       && ecs->event_thread->thread_fsm () != nullptr)
     ecs->event_thread->thread_fsm ()->clean_up (ecs->event_thread);
 
   if (!non_stop)
     {
+      scoped_restore_current_thread restore_thread;
+
       for (thread_info *thr : all_non_exited_threads ())
 	{
 	  if (thr->thread_fsm () == nullptr)
@@ -4068,9 +4075,6 @@ clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
 	  switch_to_thread (thr);
 	  thr->thread_fsm ()->clean_up (thr);
 	}
-
-      if (ecs->event_thread != nullptr)
-	switch_to_thread (ecs->event_thread);
     }
 }
 
-- 
2.35.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-24  3:20 [PATCH 0/2] Fix regressions caused by prune_threads patch Simon Marchi
  2022-04-24  3:20 ` [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled Simon Marchi
@ 2022-04-24  3:20 ` Simon Marchi
  2022-04-26 10:27   ` Andrew Burgess
  2022-04-29 12:50   ` Pedro Alves
  1 sibling, 2 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-24  3:20 UTC (permalink / raw)
  To: gdb-patches

Commit 152a17495663 ("gdb: prune inferiors at end of
fetch_inferior_event, fix intermittent failure of
gdb.threads/fork-plus-threads.exp") broke
gdb.base/vfork-follow-parent.exp with the native-gdbserver board:

    (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1
    print unblock_parent = 1^M
    $1 = 1^M
    (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1
    continue^M
    Continuing.^M
    terminate called after throwing an instance of 'gdb_exception_error'^M

I can manually reproduce the issue by running (with gdbserver running on
port 1234) with the following command (just the commands that the test
does as a one liner):

    $ ./gdb -q --data-directory=data-directory \
          testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \
	  -ex "tar rem :1234" \
	  -ex "b main" \
	  -ex c \
	  -ex "d 1" \
	  -ex "set displaced-stepping off" \
	  -ex "b *0x7ffff7d7ac5a if main == 0"
	  -ex "set detach-on-fork off"
	  -ex "set follow-fork-mode child"
	  -ex c
	  -ex "inferior 1"
	  -ex "b marker"
	  -ex c

... where 0x7ffff7d7ac5a is the exact address of the vfork syscall
(which can be found by looking at gdb.log).

The important part of the above is that a vfork syscall creates inferior
2, then inferior 2 executes until exit, then we switch back to inferior
1 and try to resume it.

The uncaught exception happens here:

    #4  0x00005596969d81a9 in error (fmt=0x559692da9e40 "Cannot execute this command while the target is running.\nUse the \"interrupt\" command to stop the target\nand then try again.")
        at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:43
    #5  0x0000559695af6f66 in remote_target::putpkt_binary (this=0x617000038080, buf=0x559692da4380 "qSymbol::", cnt=9) at /home/simark/src/binutils-gdb/gdb/remote.c:9560
    #6  0x0000559695af6aaf in remote_target::putpkt (this=0x617000038080, buf=0x559692da4380 "qSymbol::") at /home/simark/src/binutils-gdb/gdb/remote.c:9518
    #7  0x0000559695ab50dc in remote_target::remote_check_symbols (this=0x617000038080) at /home/simark/src/binutils-gdb/gdb/remote.c:5141
    #8  0x0000559695b3cccf in remote_new_objfile (objfile=0x0) at /home/simark/src/binutils-gdb/gdb/remote.c:14600
    #9  0x0000559693bc52a9 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:61
    #10 0x0000559693bb2848 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:111
    #11 0x0000559693b8dddf in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7ffe0bae0590: 0x0) at /usr/include/c++/11.2.0/bits/std_function.h:291
    #12 0x00005596956374b2 in std::function<void (objfile*)>::operator()(objfile*) const (this=0x61b0000167f8, __args#0=0x0) at /usr/include/c++/11.2.0/bits/std_function.h:560
    #13 0x0000559695633c64 in gdb::observers::observable<objfile*>::notify (this=0x55969ef5c480 <gdb::observers::new_objfile>, args#0=0x0) at /home/simark/src/binutils-gdb/gdb/../gdbsupport/observable.h:150
    #14 0x0000559695df6cc2 in clear_symtab_users (add_flags=...) at /home/simark/src/binutils-gdb/gdb/symfile.c:2873
    #15 0x000055969574c263 in program_space::~program_space (this=0x6120000c8a40, __in_chrg=<optimized out>) at /home/simark/src/binutils-gdb/gdb/progspace.c:154
    #16 0x0000559694fc086b in delete_inferior (inf=0x61700003bf80) at /home/simark/src/binutils-gdb/gdb/inferior.c:205
    #17 0x0000559694fc341f in prune_inferiors () at /home/simark/src/binutils-gdb/gdb/inferior.c:390
    #18 0x0000559695017ada in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4293
    #19 0x0000559694f629e6 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:41
    #20 0x0000559695b3b0e3 in remote_async_serial_handler (scb=0x6250001ef100, context=0x6170000380a8) at /home/simark/src/binutils-gdb/gdb/remote.c:14466
    #21 0x0000559695c59eb7 in run_async_handler_and_reschedule (scb=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:138
    #22 0x0000559695c5a42a in fd_event (error=0, context=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:189
    #23 0x00005596969d9ebf in handle_file_event (file_ptr=0x60700005af40, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:574
    #24 0x00005596969da7fa in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:700
    #25 0x00005596969d8539 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212

If I enable "set debug infrun" just before the last continue, we see:

    (gdb) continue
    Continuing.
    [infrun] clear_proceed_status_thread: 965604.965604.0
    [infrun] proceed: enter
      [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
      [infrun] scoped_disable_commit_resumed: reason=proceeding
      [infrun] start_step_over: enter
        [infrun] start_step_over: stealing global queue of threads to step, length = 0
        [infrun] operator(): step-over queue now empty
      [infrun] start_step_over: exit
      [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c
      [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0
      [infrun] prepare_to_wait: prepare_to_wait
      [infrun] reset: reason=proceeding
      [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote
      [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote
    [infrun] proceed: exit
    [infrun] fetch_inferior_event: enter
      [infrun] scoped_disable_commit_resumed: reason=handling event
      [infrun] do_target_wait: Found 2 inferiors, starting at #1
      [infrun] random_pending_event_thread: None found.
      [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
      [infrun] print_target_wait_results:   965604.965604.0 [Thread 965604.965604],
      [infrun] print_target_wait_results:   status->kind = VFORK_DONE
      [infrun] handle_inferior_event: status->kind = VFORK_DONE
      [infrun] context_switch: Switching context from 0.0.0 to 965604.965604.0
      [infrun] handle_vfork_done: not waiting for a vfork-done event
      [infrun] start_step_over: enter
        [infrun] start_step_over: stealing global queue of threads to step, length = 0
        [infrun] operator(): step-over queue now empty
      [infrun] start_step_over: exit
      [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c
      [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0
      [infrun] prepare_to_wait: prepare_to_wait
      [infrun] reset: reason=handling event
      [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote
      [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote
    terminate called after throwing an instance of 'gdb_exception_error'

What happens is:

 - After doing the "continue" on inferior 1, the remote target gives us
   a VFORK_DONE event.  The core ignores it and resumes inferior 1.
 - Since prune_inferiors is now called after each handled event, in
   fetch_inferior_event, it is called after we handled that VFORK_DONE
   event and resumed inferior 1.
 - Inferior 2 is pruned, which (see backtrace above) causes its program
   space to be deleted, which clears the symtabs for that program space,
   which calls the new_objfile observable and remote_new_objfile
   observer (with a nullptr objfile, to indicate that the previously
   loaded symbols have been discarded), which calls
   remote_check_symbols.

remote_check_symbols is the function that sends the qSymbol packet, to
let the remote side ask for symbol addresses.  The problem is that the
remote target is working in all-stop / sync mode and is currently
resumed.  It has sent a vCont packet to resume the target and is waiting
for a stop reply.  It can't send any packets in the mean time.  That
causes the exception to be thrown.

This wasn't a problem before, when prune_inferiors was called in
normal_stop, because it was always called at a time the target was not
resumed.

I realized there are simpler ways to trigger this problem.  You only
need to load a symbol file (which triggers remote_check_symbols) while
sync execution is ongoing:

    $ ./gdb -q --data-directory=data-directory -nx a.out \
          -ex "tar ext :1234" \
	  -ex "tb main" \
	  -ex c \
	  -ex "c&" \
	  -ex "add-inferior" \
	  -ex "inferior 2" \
	  -ex "file /bin/ls"
    * aborts because of the same exception *

My initial fix for it was to return early from new_objfile if objfile is
nullptr, since that means that we removed some or all symbols.  I don't
think that's a good idea, because it might be important to send qSymbol
after removing symbols.  This could help a remote target realize that
the symbols it used previously are now gone.  I don't think that
GDBserver uses that, but I think it could be used like that in theory.
It also wouldn't help with that last example, since objfile wouldn't be
nullptr.

The proposed fix is instead to return early from remote_check_symbols if
rs->waiting_for_stop_reply is set.  If I understand correctly, it's not
really necessary to use the more complete check that putpkt_binary uses:

  if (!target_is_non_stop_p ()
      && target_is_async_p ()
      && rs->waiting_for_stop_reply)

... because if rs->waiting_for_stop_reply is set, it means we're using
the all-stop / sync mode.

With this patch, I see the testsuite going back to the same state as
before 152a17495663.

Note that when I try my "load symbol file while background execution"
example with this patch applied, GDB still crashes with the same
uncaught exception, but it's another method that causes it
(remote_hostio_send_command), meaning we are getting further in the
execution.  But it would be out of the scope of this patch to fix that.

Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c
---
 gdb/remote.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gdb/remote.c b/gdb/remote.c
index 562cc586f2bf..a104f4a57a80 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5124,6 +5124,10 @@ remote_target::remote_check_symbols ()
   if (!target_has_execution ())
     return;
 
+  remote_state *rs = get_remote_state ();
+  if (rs->waiting_for_stop_reply)
+    return;
+
   if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
     return;
 
-- 
2.35.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled
  2022-04-24  3:20 ` [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled Simon Marchi
@ 2022-04-25 17:21   ` Andrew Burgess
  2022-04-26 12:43     ` Simon Marchi
  2022-04-29 12:37   ` Pedro Alves
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2022-04-25 17:21 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

> Commit 152a1749566 ("gdb: prune inferiors at end of
> fetch_inferior_event, fix intermittent failure of
> gdb.threads/fork-plus-threads.exp") introduced some follow-fork-related
> test failures, such as:
>
>     info inferiors^M
>       Num  Description       Connection           Executable        ^M
>     * 1    process 634972    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
>       2    process 634975    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
>     (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: info inferiors
>     inferior 2^M
>     [Switching to inferior 2 [process 634975] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]^M
>     [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 634975))]^M
>     #0  0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6^M
>     (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: inferior 2
>     continue^M
>     Continuing.^M
>     [Inferior 2 (process 634975) exited normally]^M
>     [Switching to Thread 0x7ffff7c9a740 (LWP 634972)]^M
>     (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: continue until exit at continue unfollowed inferior to end
>     break callee^M
>     Breakpoint 2 at 0x555555555160: file /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c, line 9.^M
>     (gdb) FAIL: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: break callee
>
> What happens here is:
>
>  - inferior 2 is selected
>  - we continue, leading to inferior 2's exit
>  - we set breakpoint, expect 2 locations, but only one location is
>    resolved
>
> Reading between the lines, we understand that inferior 2 got pruned,
> when it shouldn't have been.
>
> The issue can be reproduced by hand with:
>
>     $ ./gdb -q --data-directory=data-directory testsuite/outputs/gdb.base/foll-fork/foll-fork -ex "set detach-on-fork off" -ex start -ex "next 2" -ex "inferior 2" -ex "set debug infrun"
>     ...
>     Temporary breakpoint 1, main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:14
>     14        int  v = 5;
>     [New inferior 2 (process 637627)]
>     [Thread debugging using libthread_db enabled]
>     Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".
>     17        if (pid == 0) /* set breakpoint here */
>     [Switching to inferior 2 [process 637627] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]
>     [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 637627))]
>     #0  0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6
>     (gdb) continue
>     Continuing.
>     [infrun] clear_proceed_status_thread: 637627.637627.0
>     [infrun] proceed: enter
>       [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
>       [infrun] scoped_disable_commit_resumed: reason=proceeding
>       [infrun] start_step_over: enter
>         [infrun] start_step_over: stealing global queue of threads to step, length = 0
>         [infrun] operator(): step-over queue now empty
>       [infrun] start_step_over: exit
>       [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
>         [infrun] proceed: resuming 637627.637627.0
>         [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [637627.637627.0] at 0x7ffff7d7abf7
>         [infrun] do_target_resume: resume_ptid=637627.637627.0, step=0, sig=GDB_SIGNAL_0
>         [infrun] infrun_async: enable=1
>         [infrun] prepare_to_wait: prepare_to_wait
>       [infrun] proceed: end: resuming threads, all-stop-on-top-of-non-stop
>       [infrun] reset: reason=proceeding
>       [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target native
>       [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
>       [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
>     [infrun] proceed: exit
>     [infrun] fetch_inferior_event: enter
>       [infrun] scoped_disable_commit_resumed: reason=handling event
>       [infrun] do_target_wait: Found 2 inferiors, starting at #1
>       [infrun] random_pending_event_thread: None found.
>       [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>       [infrun] print_target_wait_results:   637627.637627.0 [process 637627],
>       [infrun] print_target_wait_results:   status->kind = EXITED, exit_status = 0
>       [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0
>     [Inferior 2 (process 637627) exited normally]
>       [infrun] stop_waiting: stop_waiting
>       [infrun] stop_all_threads: start: reason=presenting stop to user in all-stop, inf=-1
>         [infrun] stop_all_threads: pass=0, iterations=0
>         [infrun] stop_all_threads:   637624.637624.0 not executing
>         [infrun] stop_all_threads: pass=1, iterations=1
>         [infrun] stop_all_threads:   637624.637624.0 not executing
>         [infrun] stop_all_threads: done
>       [infrun] stop_all_threads: end: reason=presenting stop to user in all-stop, inf=-1
>     [Switching to Thread 0x7ffff7c9a740 (LWP 637624)]
>       [infrun] infrun_async: enable=0
>       [infrun] reset: reason=handling event
>       [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
>     (gdb) [infrun] fetch_inferior_event: exit
>     (gdb) info inferiors
>       Num  Description       Connection           Executable
>     * 1    process 637624    1 (native)           /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork
>     (gdb) i th
>       Id   Target Id                                      Frame
>     * 1    Thread 0x7ffff7c9a740 (LWP 637624) "foll-fork" main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:17
>
> After handling the EXITED event for inferior 2, inferior 2 should have
> stayed the current inferior, which should have prevented it from getting
> pruned.  When debugging, we find that when getting at the
> prune_inferiors call, the current inferior is inferior 1.  Further
> debugging shows that prior to the call to
> clean_up_just_stopped_threads_fsms, the current inferior is inferior 2,
> and after, it's inferior 1.  Then, back in fetch_inferior_event, the
> restore_thread object is disabled, due to:
>
> 	    /* If we got a TARGET_WAITKIND_NO_RESUMED event, then the
> 	       previously selected thread is gone.  We have two
> 	       choices - switch to no thread selected, or restore the
> 	       previously selected thread (now exited).  We chose the
> 	       later, just because that's what GDB used to do.  After
> 	       this, "info threads" says "The current thread <Thread
> 	       ID 2> has terminated." instead of "No thread
> 	       selected.".  */
> 	    if (!non_stop
> 		&& cmd_done
> 		&& ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED)
> 	      restore_thread.dont_restore ();
>
> So in the end, inferior 1 stays current, and inferior 2 gets wrongfully
> pruned.
>
> I'd say clean_up_just_stopped_threads_fsms is the culprit here.  It
> actually attempts to restore the event_thread to be current at the end,
> after the loop (I presume the current thread on entry is always supposed
> to be the event thread).  But in this case, the event is of kind EXITED,
> and ecs->event_thread is not set, so the current inferior isn't
> restored.
>
> Fix that by using scoped_restore_current_thread.  If there is no current
> thread, scoped_restore_current_thread will still restore the current
> inferior, and that's what we want.
>
> Random note: the thread_info object for inferior 2's thread is never
> freed.  It is held (by refcount) by the restore_thread object in
> fetch_inferior_event, while the inferior's thread list gets cleared, in
> the exit event processing.  When the refcount reaches 0 (when the
> restore_thread object is destroyed), there's nothing that actually
> deletes the thread_info object.  And I think that nothing in GDB points
> to it anymore, so it leaks.  I don't want to fix that in this patch, but
> thought it would be good to mention it, in case somebody has an idea for
> how to fix that.

Could you file this in bugzilla?

The rest of the change LGTM.

Thanks,
Andrew

>
> Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8
> ---
>  gdb/infrun.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 4e7ca803c798..6dccb7a2ff2a 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -4052,12 +4052,19 @@ reinstall_readline_callback_handler_cleanup ()
>  static void
>  clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>  {
> +  /* The first clean_up call below assumes the event thread is the current
> +     one.  */
> +  if (ecs->event_thread != nullptr)
> +    gdb_assert (ecs->event_thread == inferior_thread ());
> +
>    if (ecs->event_thread != nullptr
>        && ecs->event_thread->thread_fsm () != nullptr)
>      ecs->event_thread->thread_fsm ()->clean_up (ecs->event_thread);
>  
>    if (!non_stop)
>      {
> +      scoped_restore_current_thread restore_thread;
> +
>        for (thread_info *thr : all_non_exited_threads ())
>  	{
>  	  if (thr->thread_fsm () == nullptr)
> @@ -4068,9 +4075,6 @@ clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>  	  switch_to_thread (thr);
>  	  thr->thread_fsm ()->clean_up (thr);
>  	}
> -
> -      if (ecs->event_thread != nullptr)
> -	switch_to_thread (ecs->event_thread);
>      }
>  }
>  
> -- 
> 2.35.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-24  3:20 ` [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Simon Marchi
@ 2022-04-26 10:27   ` Andrew Burgess
  2022-04-26 14:15     ` Simon Marchi
  2022-04-29 12:50   ` Pedro Alves
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2022-04-26 10:27 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

> Commit 152a17495663 ("gdb: prune inferiors at end of
> fetch_inferior_event, fix intermittent failure of
> gdb.threads/fork-plus-threads.exp") broke
> gdb.base/vfork-follow-parent.exp with the native-gdbserver board:
>
>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1
>     print unblock_parent = 1^M
>     $1 = 1^M
>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1
>     continue^M
>     Continuing.^M
>     terminate called after throwing an instance of 'gdb_exception_error'^M
>
> I can manually reproduce the issue by running (with gdbserver running on
> port 1234) with the following command (just the commands that the test
> does as a one liner):
>
>     $ ./gdb -q --data-directory=data-directory \
>           testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \

It was a little confusing that the example you give above is for one
test, but this one liner is for another.  Given that both tests fail, it
might be nicer to mention that both tests fail, but quote from the test
you are about to use as an example...

> 	  -ex "tar rem :1234" \

You can simplify this to just:

  -ex "target remote | gdbserver - path/to/executable"

then it really is a one liners.


> 	  -ex "b main" \
> 	  -ex c \
> 	  -ex "d 1" \

And this can be just:

  -ex "start"

right?


> 	  -ex "set displaced-stepping off" \
> 	  -ex "b *0x7ffff7d7ac5a if main == 0"
> 	  -ex "set detach-on-fork off"
> 	  -ex "set follow-fork-mode child"
> 	  -ex c
> 	  -ex "inferior 1"
> 	  -ex "b marker"
> 	  -ex c

You're missing the backslashes from the last 7 lines here.



>
> ... where 0x7ffff7d7ac5a is the exact address of the vfork syscall
> (which can be found by looking at gdb.log).
>
> The important part of the above is that a vfork syscall creates inferior
> 2, then inferior 2 executes until exit, then we switch back to inferior
> 1 and try to resume it.
>
> The uncaught exception happens here:
>
>     #4  0x00005596969d81a9 in error (fmt=0x559692da9e40 "Cannot execute this command while the target is running.\nUse the \"interrupt\" command to stop the target\nand then try again.")
>         at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:43
>     #5  0x0000559695af6f66 in remote_target::putpkt_binary (this=0x617000038080, buf=0x559692da4380 "qSymbol::", cnt=9) at /home/simark/src/binutils-gdb/gdb/remote.c:9560
>     #6  0x0000559695af6aaf in remote_target::putpkt (this=0x617000038080, buf=0x559692da4380 "qSymbol::") at /home/simark/src/binutils-gdb/gdb/remote.c:9518
>     #7  0x0000559695ab50dc in remote_target::remote_check_symbols (this=0x617000038080) at /home/simark/src/binutils-gdb/gdb/remote.c:5141
>     #8  0x0000559695b3cccf in remote_new_objfile (objfile=0x0) at /home/simark/src/binutils-gdb/gdb/remote.c:14600
>     #9  0x0000559693bc52a9 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:61
>     #10 0x0000559693bb2848 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:111
>     #11 0x0000559693b8dddf in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7ffe0bae0590: 0x0) at /usr/include/c++/11.2.0/bits/std_function.h:291
>     #12 0x00005596956374b2 in std::function<void (objfile*)>::operator()(objfile*) const (this=0x61b0000167f8, __args#0=0x0) at /usr/include/c++/11.2.0/bits/std_function.h:560
>     #13 0x0000559695633c64 in gdb::observers::observable<objfile*>::notify (this=0x55969ef5c480 <gdb::observers::new_objfile>, args#0=0x0) at /home/simark/src/binutils-gdb/gdb/../gdbsupport/observable.h:150
>     #14 0x0000559695df6cc2 in clear_symtab_users (add_flags=...) at /home/simark/src/binutils-gdb/gdb/symfile.c:2873
>     #15 0x000055969574c263 in program_space::~program_space (this=0x6120000c8a40, __in_chrg=<optimized out>) at /home/simark/src/binutils-gdb/gdb/progspace.c:154
>     #16 0x0000559694fc086b in delete_inferior (inf=0x61700003bf80) at /home/simark/src/binutils-gdb/gdb/inferior.c:205
>     #17 0x0000559694fc341f in prune_inferiors () at /home/simark/src/binutils-gdb/gdb/inferior.c:390
>     #18 0x0000559695017ada in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4293
>     #19 0x0000559694f629e6 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:41
>     #20 0x0000559695b3b0e3 in remote_async_serial_handler (scb=0x6250001ef100, context=0x6170000380a8) at /home/simark/src/binutils-gdb/gdb/remote.c:14466
>     #21 0x0000559695c59eb7 in run_async_handler_and_reschedule (scb=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:138
>     #22 0x0000559695c5a42a in fd_event (error=0, context=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:189
>     #23 0x00005596969d9ebf in handle_file_event (file_ptr=0x60700005af40, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:574
>     #24 0x00005596969da7fa in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:700
>     #25 0x00005596969d8539 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212
>
> If I enable "set debug infrun" just before the last continue, we see:
>
>     (gdb) continue
>     Continuing.
>     [infrun] clear_proceed_status_thread: 965604.965604.0
>     [infrun] proceed: enter
>       [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
>       [infrun] scoped_disable_commit_resumed: reason=proceeding
>       [infrun] start_step_over: enter
>         [infrun] start_step_over: stealing global queue of threads to step, length = 0
>         [infrun] operator(): step-over queue now empty
>       [infrun] start_step_over: exit
>       [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c
>       [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0
>       [infrun] prepare_to_wait: prepare_to_wait
>       [infrun] reset: reason=proceeding
>       [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote
>       [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote
>     [infrun] proceed: exit
>     [infrun] fetch_inferior_event: enter
>       [infrun] scoped_disable_commit_resumed: reason=handling event
>       [infrun] do_target_wait: Found 2 inferiors, starting at #1
>       [infrun] random_pending_event_thread: None found.
>       [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>       [infrun] print_target_wait_results:   965604.965604.0 [Thread 965604.965604],
>       [infrun] print_target_wait_results:   status->kind = VFORK_DONE
>       [infrun] handle_inferior_event: status->kind = VFORK_DONE
>       [infrun] context_switch: Switching context from 0.0.0 to 965604.965604.0
>       [infrun] handle_vfork_done: not waiting for a vfork-done event
>       [infrun] start_step_over: enter
>         [infrun] start_step_over: stealing global queue of threads to step, length = 0
>         [infrun] operator(): step-over queue now empty
>       [infrun] start_step_over: exit
>       [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c
>       [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0
>       [infrun] prepare_to_wait: prepare_to_wait
>       [infrun] reset: reason=handling event
>       [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote
>       [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote
>     terminate called after throwing an instance of 'gdb_exception_error'
>
> What happens is:
>
>  - After doing the "continue" on inferior 1, the remote target gives us
>    a VFORK_DONE event.  The core ignores it and resumes inferior 1.
>  - Since prune_inferiors is now called after each handled event, in
>    fetch_inferior_event, it is called after we handled that VFORK_DONE
>    event and resumed inferior 1.
>  - Inferior 2 is pruned, which (see backtrace above) causes its program
>    space to be deleted, which clears the symtabs for that program space,
>    which calls the new_objfile observable and remote_new_objfile
>    observer (with a nullptr objfile, to indicate that the previously
>    loaded symbols have been discarded), which calls
>    remote_check_symbols.
>
> remote_check_symbols is the function that sends the qSymbol packet, to
> let the remote side ask for symbol addresses.  The problem is that the
> remote target is working in all-stop / sync mode and is currently
> resumed.  It has sent a vCont packet to resume the target and is waiting
> for a stop reply.  It can't send any packets in the mean time.  That
> causes the exception to be thrown.
>
> This wasn't a problem before, when prune_inferiors was called in
> normal_stop, because it was always called at a time the target was not
> resumed.
>
> I realized there are simpler ways to trigger this problem.  You only
> need to load a symbol file (which triggers remote_check_symbols) while
> sync execution is ongoing:
>
>     $ ./gdb -q --data-directory=data-directory -nx a.out \
>           -ex "tar ext :1234" \
> 	  -ex "tb main" \
> 	  -ex c \

Use 'start' ?

> 	  -ex "c&" \
> 	  -ex "add-inferior" \
> 	  -ex "inferior 2" \
> 	  -ex "file /bin/ls"
>     * aborts because of the same exception *
>
> My initial fix for it was to return early from new_objfile if objfile is
> nullptr, since that means that we removed some or all symbols.  I don't
> think that's a good idea, because it might be important to send qSymbol
> after removing symbols.  This could help a remote target realize that
> the symbols it used previously are now gone.  I don't think that
> GDBserver uses that, but I think it could be used like that in theory.
> It also wouldn't help with that last example, since objfile wouldn't be
> nullptr.
>
> The proposed fix is instead to return early from remote_check_symbols if
> rs->waiting_for_stop_reply is set.

What I didn't understand when reading this patch is that in the previous
paragraph you seem to argue that sending qSymbol, though not currently
"required" is a good thing, and we should preserve that behaviour.

Then in the second paragraph you suggest that the fix is to just not
send qSymbol when the target can't respond.  But there's no proposal to
send qSymbol later... so isn't your solution at odds with what you just
said?

I'm not suggesting that we should necessarily rush to send qSymbol at
some deferred time, I guess I'm just not understanding the two previous
paragraphs, and why the solution is OK, given what you just said.


>                                      If I understand correctly, it's not
> really necessary to use the more complete check that putpkt_binary uses:
>
>   if (!target_is_non_stop_p ()
>       && target_is_async_p ()
>       && rs->waiting_for_stop_reply)
>
> ... because if rs->waiting_for_stop_reply is set, it means we're using
> the all-stop / sync mode.
>
> With this patch, I see the testsuite going back to the same state as
> before 152a17495663.
>
> Note that when I try my "load symbol file while background execution"
> example with this patch applied, GDB still crashes with the same
> uncaught exception, but it's another method that causes it
> (remote_hostio_send_command), meaning we are getting further in the
> execution.  But it would be out of the scope of this patch to fix
> that.

Should there be a bugzilla entry created for the remaining issue?

>
> Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c
> ---
>  gdb/remote.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 562cc586f2bf..a104f4a57a80 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -5124,6 +5124,10 @@ remote_target::remote_check_symbols ()
>    if (!target_has_execution ())
>      return;
>  
> +  remote_state *rs = get_remote_state ();
> +  if (rs->waiting_for_stop_reply)
> +    return;
> +
>    if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
>      return;
>  

I'm OK with this change.  My assumption would be that symbol values held
on the remote end will be associated with a specific inferior (or maybe
with an objfile in use by one or more inferiors).  As the remote knows
when the inferior has exited I assume the remote knows to throw away
symbols that are no longer valid/used.  As a result I don't really see
any urgency to send qSymbol unless some other use case comes up.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled
  2022-04-25 17:21   ` Andrew Burgess
@ 2022-04-26 12:43     ` Simon Marchi
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-26 12:43 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

> Could you file this in bugzilla?

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=29091

> The rest of the change LGTM.

Thanks!

Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-26 10:27   ` Andrew Burgess
@ 2022-04-26 14:15     ` Simon Marchi
  2022-04-27 10:11       ` Andrew Burgess
  0 siblings, 1 reply; 16+ messages in thread
From: Simon Marchi @ 2022-04-26 14:15 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches



On 2022-04-26 06:27, Andrew Burgess wrote:
> Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
> 
>> Commit 152a17495663 ("gdb: prune inferiors at end of
>> fetch_inferior_event, fix intermittent failure of
>> gdb.threads/fork-plus-threads.exp") broke
>> gdb.base/vfork-follow-parent.exp with the native-gdbserver board:
>>
>>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1
>>     print unblock_parent = 1^M
>>     $1 = 1^M
>>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1
>>     continue^M
>>     Continuing.^M
>>     terminate called after throwing an instance of 'gdb_exception_error'^M
>>
>> I can manually reproduce the issue by running (with gdbserver running on
>> port 1234) with the following command (just the commands that the test
>> does as a one liner):
>>
>>     $ ./gdb -q --data-directory=data-directory \
>>           testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \
> 
> It was a little confusing that the example you give above is for one
> test, but this one liner is for another.  Given that both tests fail, it
> might be nicer to mention that both tests fail, but quote from the test
> you are about to use as an example...

Hmm right, that wasn't intended.  Fixed.

>> 	  -ex "tar rem :1234" \
> 
> You can simplify this to just:
> 
>   -ex "target remote | gdbserver - path/to/executable"

Fixed.

> 
> then it really is a one liners.
> 
> 
>> 	  -ex "b main" \
>> 	  -ex c \
>> 	  -ex "d 1" \
> 
> And this can be just:
> 
>   -ex "start"
> 
> right?

No, since we don't run, we continue.  It could be tbreak instead of
break + delete though.

> 
> 
>> 	  -ex "set displaced-stepping off" \
>> 	  -ex "b *0x7ffff7d7ac5a if main == 0"
>> 	  -ex "set detach-on-fork off"
>> 	  -ex "set follow-fork-mode child"
>> 	  -ex c
>> 	  -ex "inferior 1"
>> 	  -ex "b marker"
>> 	  -ex c
> 
> You're missing the backslashes from the last 7 lines here.

Oops, right.  I did the formatting in the email so it wouldn't all be
one a single line, and forgot some.

>> I realized there are simpler ways to trigger this problem.  You only
>> need to load a symbol file (which triggers remote_check_symbols) while
>> sync execution is ongoing:
>>
>>     $ ./gdb -q --data-directory=data-directory -nx a.out \
>>           -ex "tar ext :1234" \
>> 	  -ex "tb main" \
>> 	  -ex c \
> 
> Use 'start' ?

No, since we continue and don't run.

> 
>> 	  -ex "c&" \
>> 	  -ex "add-inferior" \
>> 	  -ex "inferior 2" \
>> 	  -ex "file /bin/ls"
>>     * aborts because of the same exception *
>>
>> My initial fix for it was to return early from new_objfile if objfile is
>> nullptr, since that means that we removed some or all symbols.  I don't
>> think that's a good idea, because it might be important to send qSymbol
>> after removing symbols.  This could help a remote target realize that
>> the symbols it used previously are now gone.  I don't think that
>> GDBserver uses that, but I think it could be used like that in theory.
>> It also wouldn't help with that last example, since objfile wouldn't be
>> nullptr.
>>
>> The proposed fix is instead to return early from remote_check_symbols if
>> rs->waiting_for_stop_reply is set.
> 
> What I didn't understand when reading this patch is that in the previous
> paragraph you seem to argue that sending qSymbol, though not currently
> "required" is a good thing, and we should preserve that behaviour.
> 
> Then in the second paragraph you suggest that the fix is to just not
> send qSymbol when the target can't respond.  But there's no proposal to
> send qSymbol later... so isn't your solution at odds with what you just
> said?

In this case, the target is resumed, so we for sure can't send it.  But
if the target is stopped and the user uses the "file" command to discard
symbols, remote_check_symbols is called with nullptr.  Today, we send a
qSymbol in this case, but with the suggestion above implemented, we
wouldn't.  And that feels like a regression.

You're right though that with the fix I propose in this patch, we never
end up sending a qSymbol.  But given that the current behavior is that
we crash... that doesn't feel like a regression.  Sending a deferred
qSymbol the next time we stop would be an option, if someone ever needs
it.  But at the moment, this is just an imaginary use case.  I'm not
sure what using the "file" command by itself on a resumed target really
means, given it doesn't really change what's loaded in memory.

On a side-note, it seems like we don't send a qSymbol when a library is
unloaded (i.e. dlclose).  It seems to me like it would be useful to do
so.  If some module on the remote target gets enabled when a particular
library is loaded by the inferior, it would make sense that it gets
disabled when that library is unloaded.

On a second side-note, we don't send qSymbol when doing a "run".  In
practice, most processes load some library early in their startup, so we
do end up sending a qSymbol then.  If the remote side needs any symbol
defined in the main objfile, they'll pick it up then.  But that could be
a problem for static executables that don't load any library.  I gave
it a try, and found that GDB still sends a qSymbol in reaction to adding
the dummy vsyscall objfile in add_vsyscall_page.  So if it wasn't for
that, we would never send a qSymbol on run.  That's kind of a hidden bug
right now.

> I'm not suggesting that we should necessarily rush to send qSymbol at
> some deferred time, I guess I'm just not understanding the two previous
> paragraphs, and why the solution is OK, given what you just said.

The gist of it is: the fix prevents trying to send qSymbol when we
really can't, but we keep sending it when we can.

>>                                      If I understand correctly, it's not
>> really necessary to use the more complete check that putpkt_binary uses:
>>
>>   if (!target_is_non_stop_p ()
>>       && target_is_async_p ()
>>       && rs->waiting_for_stop_reply)
>>
>> ... because if rs->waiting_for_stop_reply is set, it means we're using
>> the all-stop / sync mode.
>>
>> With this patch, I see the testsuite going back to the same state as
>> before 152a17495663.
>>
>> Note that when I try my "load symbol file while background execution"
>> example with this patch applied, GDB still crashes with the same
>> uncaught exception, but it's another method that causes it
>> (remote_hostio_send_command), meaning we are getting further in the
>> execution.  But it would be out of the scope of this patch to fix
>> that.
> 
> Should there be a bugzilla entry created for the remaining issue?

I can file one when this patch is merged (it wouldn't make sense to file
a bugzilla entry for a behavior that isn't in master).

>> Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c
>> ---
>>  gdb/remote.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/gdb/remote.c b/gdb/remote.c
>> index 562cc586f2bf..a104f4a57a80 100644
>> --- a/gdb/remote.c
>> +++ b/gdb/remote.c
>> @@ -5124,6 +5124,10 @@ remote_target::remote_check_symbols ()
>>    if (!target_has_execution ())
>>      return;
>>  
>> +  remote_state *rs = get_remote_state ();
>> +  if (rs->waiting_for_stop_reply)
>> +    return;
>> +
>>    if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
>>      return;
>>  
> 
> I'm OK with this change.  My assumption would be that symbol values held
> on the remote end will be associated with a specific inferior (or maybe
> with an objfile in use by one or more inferiors).  As the remote knows
> when the inferior has exited I assume the remote knows to throw away
> symbols that are no longer valid/used.  As a result I don't really see
> any urgency to send qSymbol unless some other use case comes up.

The remote side has no notion of inferiors, only of actually running
processes.  Speaking of GDBserver (I can't tell how other remote
implementations use qSymbol), the thread_db symbols are per-process.  So
when a process exits, it all gets destroyed, and if a new process
appears (which could be the same inferior in GDB), GDBserver will ask
for new symbols when it gets the chance.

GDBserver also uses qSymbol for libinproctrace symbols, for fast
tracepoints.  Those are stored in globals, so I can only think that it's
broken when using multiple processes.

Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-26 14:15     ` Simon Marchi
@ 2022-04-27 10:11       ` Andrew Burgess
  2022-04-27 14:01         ` Simon Marchi
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2022-04-27 10:11 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

> On 2022-04-26 06:27, Andrew Burgess wrote:
>> Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
>> 
>>> Commit 152a17495663 ("gdb: prune inferiors at end of
>>> fetch_inferior_event, fix intermittent failure of
>>> gdb.threads/fork-plus-threads.exp") broke
>>> gdb.base/vfork-follow-parent.exp with the native-gdbserver board:
>>>
>>>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1
>>>     print unblock_parent = 1^M
>>>     $1 = 1^M
>>>     (gdb) PASS: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1
>>>     continue^M
>>>     Continuing.^M
>>>     terminate called after throwing an instance of 'gdb_exception_error'^M
>>>
>>> I can manually reproduce the issue by running (with gdbserver running on
>>> port 1234) with the following command (just the commands that the test
>>> does as a one liner):
>>>
>>>     $ ./gdb -q --data-directory=data-directory \
>>>           testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \
>> 
>> It was a little confusing that the example you give above is for one
>> test, but this one liner is for another.  Given that both tests fail, it
>> might be nicer to mention that both tests fail, but quote from the test
>> you are about to use as an example...
>
> Hmm right, that wasn't intended.  Fixed.
>
>>> 	  -ex "tar rem :1234" \
>> 
>> You can simplify this to just:
>> 
>>   -ex "target remote | gdbserver - path/to/executable"
>
> Fixed.
>
>> 
>> then it really is a one liners.
>> 
>> 
>>> 	  -ex "b main" \
>>> 	  -ex c \
>>> 	  -ex "d 1" \
>> 
>> And this can be just:
>> 
>>   -ex "start"
>> 
>> right?
>
> No, since we don't run, we continue.  It could be tbreak instead of
> break + delete though.
>
>> 
>> 
>>> 	  -ex "set displaced-stepping off" \
>>> 	  -ex "b *0x7ffff7d7ac5a if main == 0"
>>> 	  -ex "set detach-on-fork off"
>>> 	  -ex "set follow-fork-mode child"
>>> 	  -ex c
>>> 	  -ex "inferior 1"
>>> 	  -ex "b marker"
>>> 	  -ex c
>> 
>> You're missing the backslashes from the last 7 lines here.
>
> Oops, right.  I did the formatting in the email so it wouldn't all be
> one a single line, and forgot some.
>
>>> I realized there are simpler ways to trigger this problem.  You only
>>> need to load a symbol file (which triggers remote_check_symbols) while
>>> sync execution is ongoing:
>>>
>>>     $ ./gdb -q --data-directory=data-directory -nx a.out \
>>>           -ex "tar ext :1234" \
>>> 	  -ex "tb main" \
>>> 	  -ex c \
>> 
>> Use 'start' ?
>
> No, since we continue and don't run.
>
>> 
>>> 	  -ex "c&" \
>>> 	  -ex "add-inferior" \
>>> 	  -ex "inferior 2" \
>>> 	  -ex "file /bin/ls"
>>>     * aborts because of the same exception *
>>>
>>> My initial fix for it was to return early from new_objfile if objfile is
>>> nullptr, since that means that we removed some or all symbols.  I don't
>>> think that's a good idea, because it might be important to send qSymbol
>>> after removing symbols.  This could help a remote target realize that
>>> the symbols it used previously are now gone.  I don't think that
>>> GDBserver uses that, but I think it could be used like that in theory.
>>> It also wouldn't help with that last example, since objfile wouldn't be
>>> nullptr.
>>>
>>> The proposed fix is instead to return early from remote_check_symbols if
>>> rs->waiting_for_stop_reply is set.
>> 
>> What I didn't understand when reading this patch is that in the previous
>> paragraph you seem to argue that sending qSymbol, though not currently
>> "required" is a good thing, and we should preserve that behaviour.
>> 
>> Then in the second paragraph you suggest that the fix is to just not
>> send qSymbol when the target can't respond.  But there's no proposal to
>> send qSymbol later... so isn't your solution at odds with what you just
>> said?
>
> In this case, the target is resumed, so we for sure can't send it.  But
> if the target is stopped and the user uses the "file" command to discard
> symbols, remote_check_symbols is called with nullptr.  Today, we send a
> qSymbol in this case, but with the suggestion above implemented, we
> wouldn't.  And that feels like a regression.
>
> You're right though that with the fix I propose in this patch, we never
> end up sending a qSymbol.  But given that the current behavior is that
> we crash... that doesn't feel like a regression.

Sorry, I guess I misunderstood, I thought that the crash had been
introduced with commit 152a17495663 ("gdb: prune inferiors at end of
fetch_inferior_event, fix intermittent failure of
gdb.threads/fork-plus-threads.exp") - at least that was the impression I
got from the first paragraph of your commit message.

I confess I didn't go back and check the behaviour before that commit
was merged, but I'd kind of assumed we didn't crash then, but instead
ended up sending qSymbol later.  That was where my confusion was coming
from, comparing pre-152a17495663 behaviour to the behaviour after this
commit.


>                                                   Sending a deferred
> qSymbol the next time we stop would be an option, if someone ever needs
> it.  But at the moment, this is just an imaginary use case.  I'm not
> sure what using the "file" command by itself on a resumed target really
> means, given it doesn't really change what's loaded in memory.

I agree that we could/should wait until an actual use case presents
itself before we worry too much about some of these details.

>
> On a side-note, it seems like we don't send a qSymbol when a library is
> unloaded (i.e. dlclose).  It seems to me like it would be useful to do
> so.  If some module on the remote target gets enabled when a particular
> library is loaded by the inferior, it would make sense that it gets
> disabled when that library is unloaded.
>
> On a second side-note, we don't send qSymbol when doing a "run".  In
> practice, most processes load some library early in their startup, so we
> do end up sending a qSymbol then.  If the remote side needs any symbol
> defined in the main objfile, they'll pick it up then.  But that could be
> a problem for static executables that don't load any library.  I gave
> it a try, and found that GDB still sends a qSymbol in reaction to adding
> the dummy vsyscall objfile in add_vsyscall_page.  So if it wasn't for
> that, we would never send a qSymbol on run.  That's kind of a hidden bug
> right now.
>
>> I'm not suggesting that we should necessarily rush to send qSymbol at
>> some deferred time, I guess I'm just not understanding the two previous
>> paragraphs, and why the solution is OK, given what you just said.
>
> The gist of it is: the fix prevents trying to send qSymbol when we
> really can't, but we keep sending it when we can.
>
>>>                                      If I understand correctly, it's not
>>> really necessary to use the more complete check that putpkt_binary uses:
>>>
>>>   if (!target_is_non_stop_p ()
>>>       && target_is_async_p ()
>>>       && rs->waiting_for_stop_reply)
>>>
>>> ... because if rs->waiting_for_stop_reply is set, it means we're using
>>> the all-stop / sync mode.
>>>
>>> With this patch, I see the testsuite going back to the same state as
>>> before 152a17495663.
>>>
>>> Note that when I try my "load symbol file while background execution"
>>> example with this patch applied, GDB still crashes with the same
>>> uncaught exception, but it's another method that causes it
>>> (remote_hostio_send_command), meaning we are getting further in the
>>> execution.  But it would be out of the scope of this patch to fix
>>> that.
>> 
>> Should there be a bugzilla entry created for the remaining issue?
>
> I can file one when this patch is merged (it wouldn't make sense to file
> a bugzilla entry for a behavior that isn't in master).
>
>>> Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c
>>> ---
>>>  gdb/remote.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/gdb/remote.c b/gdb/remote.c
>>> index 562cc586f2bf..a104f4a57a80 100644
>>> --- a/gdb/remote.c
>>> +++ b/gdb/remote.c
>>> @@ -5124,6 +5124,10 @@ remote_target::remote_check_symbols ()
>>>    if (!target_has_execution ())
>>>      return;
>>>  
>>> +  remote_state *rs = get_remote_state ();
>>> +  if (rs->waiting_for_stop_reply)
>>> +    return;
>>> +
>>>    if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
>>>      return;
>>>  
>> 
>> I'm OK with this change.  My assumption would be that symbol values held
>> on the remote end will be associated with a specific inferior (or maybe
>> with an objfile in use by one or more inferiors).  As the remote knows
>> when the inferior has exited I assume the remote knows to throw away
>> symbols that are no longer valid/used.  As a result I don't really see
>> any urgency to send qSymbol unless some other use case comes up.
>
> The remote side has no notion of inferiors, only of actually running
> processes.

You're right to be exact with the terminology, but I was using inferior
for process.  And though the "inferior" class might only exist in GDB,
the inferior terminology has definitely leaked over into gdbserver - the
code base is littered with references to inferiors.

>             Speaking of GDBserver (I can't tell how other remote
> implementations use qSymbol), the thread_db symbols are per-process.  So
> when a process exits, it all gets destroyed, and if a new process
> appears (which could be the same inferior in GDB), GDBserver will ask
> for new symbols when it gets the chance.

This is how I'd expect things to operate.  I mean, if a server
implementation was smart it could share symbol information between
processes that used the same object files, loaded at the same address,
but that's implementation detail.  Really, symbols are per-process.

> GDBserver also uses qSymbol for libinproctrace symbols, for fast
> tracepoints.  Those are stored in globals, so I can only think that it's
> broken when using multiple processes.

I agree with this assesment.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-27 10:11       ` Andrew Burgess
@ 2022-04-27 14:01         ` Simon Marchi
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-27 14:01 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

> Sorry, I guess I misunderstood, I thought that the crash had been
> introduced with commit 152a17495663 ("gdb: prune inferiors at end of
> fetch_inferior_event, fix intermittent failure of
> gdb.threads/fork-plus-threads.exp") - at least that was the impression I
> got from the first paragraph of your commit message.
> 
> I confess I didn't go back and check the behaviour before that commit
> was merged, but I'd kind of assumed we didn't crash then, but instead
> ended up sending qSymbol later.  That was where my confusion was coming
> from, comparing pre-152a17495663 behaviour to the behaviour after this
> commit.

To be clear, commit 152a17495663 introduced a crash in a test, causing
a testsuite regression.  But that crash could be already be triggered 
pre-152a17495663, with some other commands (that probably nobody uses).
So 152a17495663 made that pre-existing bug apparent, so it made it
necessary to fix it.  Hope that's clear.
>>                                                   Sending a deferred
>> qSymbol the next time we stop would be an option, if someone ever needs
>> it.  But at the moment, this is just an imaginary use case.  I'm not
>> sure what using the "file" command by itself on a resumed target really
>> means, given it doesn't really change what's loaded in memory.
> 
> I agree that we could/should wait until an actual use case presents
> itself before we worry too much about some of these details.

Ack.

>> The remote side has no notion of inferiors, only of actually running
>> processes.
> 
> You're right to be exact with the terminology, but I was using inferior
> for process.  And though the "inferior" class might only exist in GDB,
> the inferior terminology has definitely leaked over into gdbserver - the
> code base is littered with references to inferiors.

Heh, you're right.

Thanks for the review, I'll wait a bit to see if Pedro has any comments on
the fixes.

Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled
  2022-04-24  3:20 ` [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled Simon Marchi
  2022-04-25 17:21   ` Andrew Burgess
@ 2022-04-29 12:37   ` Pedro Alves
  2022-04-29 13:25     ` Simon Marchi
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2022-04-29 12:37 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-04-24 04:20, Simon Marchi via Gdb-patches wrote:

> Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8
> ---
>  gdb/infrun.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 4e7ca803c798..6dccb7a2ff2a 100644

Looks fine.

> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -4052,12 +4052,19 @@ reinstall_readline_callback_handler_cleanup ()
>  static void
>  clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>  {
> +  /* The first clean_up call below assumes the event thread is the current
> +     one.  */
> +  if (ecs->event_thread != nullptr)
> +    gdb_assert (ecs->event_thread == inferior_thread ());
> +
>    if (ecs->event_thread != nullptr
>        && ecs->event_thread->thread_fsm () != nullptr)
>      ecs->event_thread->thread_fsm ()->clean_up (ecs->event_thread);
>  
>    if (!non_stop)
>      {
> +      scoped_restore_current_thread restore_thread;
> +
>        for (thread_info *thr : all_non_exited_threads ())
>  	{
>  	  if (thr->thread_fsm () == nullptr)
> @@ -4068,9 +4075,6 @@ clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>  	  switch_to_thread (thr);
>  	  thr->thread_fsm ()->clean_up (thr);

We could also look at eliminating the need to switch the thread
when we're about to call a function that takes an explicit thread_info pointer,
but I'm fine with what you have, especially since in practice that would just probably
mean a scoped_restore_current_thread per clean_up call instead of a just a single one for
all threads...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-24  3:20 ` [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Simon Marchi
  2022-04-26 10:27   ` Andrew Burgess
@ 2022-04-29 12:50   ` Pedro Alves
  2022-04-29 14:53     ` Simon Marchi
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2022-04-29 12:50 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-04-24 04:20, Simon Marchi via Gdb-patches wrote:

> What happens is:
> 
>  - After doing the "continue" on inferior 1, the remote target gives us
>    a VFORK_DONE event.  The core ignores it and resumes inferior 1.
>  - Since prune_inferiors is now called after each handled event, in
>    fetch_inferior_event, it is called after we handled that VFORK_DONE
>    event and resumed inferior 1.
>  - Inferior 2 is pruned, which (see backtrace above) causes its program
>    space to be deleted, which clears the symtabs for that program space,
>    which calls the new_objfile observable and remote_new_objfile
>    observer (with a nullptr objfile, to indicate that the previously
>    loaded symbols have been discarded), which calls
>    remote_check_symbols.
> 
> remote_check_symbols is the function that sends the qSymbol packet, to
> let the remote side ask for symbol addresses.  The problem is that the
> remote target is working in all-stop / sync mode and is currently
> resumed.  It has sent a vCont packet to resume the target and is waiting
> for a stop reply.  It can't send any packets in the mean time.  That
> causes the exception to be thrown.

The inferiors that are pruned, their corresponding processes are gone, otherwise
they wouldn't be pruned.  So it seems wrong to even send a qSymbol packet at all,
as qSymbol on the remote end will be interpreted as "new symbols in the current process",
and the remote current process will have nothing to do with the pruned inferior.

Note this comment and target_has_execution check:

 /* Symbol look-up.  */

 void
 remote_target::remote_check_symbols ()
 {
   char *tmp;
   int end;

   /* The remote side has no concept of inferiors that aren't running
      yet, it only knows about running processes.  If we're connected
      but our current inferior is not running, we should not invite the
      remote target to request symbol lookups related to its
      (unrelated) current process.  */
   if (!target_has_execution ())
     return;

So if we're getting past this check, it seems to me that this means that we're calling
remote_check_symbols in the context of some other inferior other than the one that
is being pruned, which seems to be to be the bug here.

Note a bit below that function does:

  /* Make sure the remote is pointing at the right process.  Note
     there's no way to select "no process".  */
  set_general_process ();

... again, this is going to set the remote focus on gdb's current inferior,
but that inferior has no relation to the inferior that was pruned, so the
remote end is going to be querying symbols for a process that hasn't
really changed its symbols.

> 
> This wasn't a problem before, when prune_inferiors was called in
> normal_stop, because it was always called at a time the target was not
> resumed.
> 
> I realized there are simpler ways to trigger this problem.  You only
> need to load a symbol file (which triggers remote_check_symbols) while
> sync execution is ongoing:
> 
>     $ ./gdb -q --data-directory=data-directory -nx a.out \
>           -ex "tar ext :1234" \
> 	  -ex "tb main" \
> 	  -ex c \
> 	  -ex "c&" \
> 	  -ex "add-inferior" \
> 	  -ex "inferior 2" \
> 	  -ex "file /bin/ls"
>     * aborts because of the same exception *
> 

This is a valid problem, but a different problem, IMO.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled
  2022-04-29 12:37   ` Pedro Alves
@ 2022-04-29 13:25     ` Simon Marchi
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-29 13:25 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2022-04-29 08:37, Pedro Alves wrote:
> On 2022-04-24 04:20, Simon Marchi via Gdb-patches wrote:
>
>> Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8
>> ---
>>  gdb/infrun.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>> index 4e7ca803c798..6dccb7a2ff2a 100644
>
> Looks fine.
>
>> --- a/gdb/infrun.c
>> +++ b/gdb/infrun.c
>> @@ -4052,12 +4052,19 @@ reinstall_readline_callback_handler_cleanup ()
>>  static void
>>  clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>>  {
>> +  /* The first clean_up call below assumes the event thread is the current
>> +     one.  */
>> +  if (ecs->event_thread != nullptr)
>> +    gdb_assert (ecs->event_thread == inferior_thread ());
>> +
>>    if (ecs->event_thread != nullptr
>>        && ecs->event_thread->thread_fsm () != nullptr)
>>      ecs->event_thread->thread_fsm ()->clean_up (ecs->event_thread);
>>
>>    if (!non_stop)
>>      {
>> +      scoped_restore_current_thread restore_thread;
>> +
>>        for (thread_info *thr : all_non_exited_threads ())
>>  	{
>>  	  if (thr->thread_fsm () == nullptr)
>> @@ -4068,9 +4075,6 @@ clean_up_just_stopped_threads_fsms (struct execution_control_state *ecs)
>>  	  switch_to_thread (thr);
>>  	  thr->thread_fsm ()->clean_up (thr);
>
> We could also look at eliminating the need to switch the thread
> when we're about to call a function that takes an explicit thread_info pointer,
> but I'm fine with what you have, especially since in practice that would just probably
> mean a scoped_restore_current_thread per clean_up call instead of a just a single one for
> all threads...

I'll push this commit as-is, as it's the "obvious" fix, but I'll take a
look at pushing down the switch_to_thread down to the clean_up
implementations.  I think it's a good step towards reducing dependency
on global state.

Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-29 12:50   ` Pedro Alves
@ 2022-04-29 14:53     ` Simon Marchi
  2022-04-29 15:39       ` Pedro Alves
  2022-04-29 15:47       ` Pedro Alves
  0 siblings, 2 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-29 14:53 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2022-04-29 08:50, Pedro Alves wrote:
> On 2022-04-24 04:20, Simon Marchi via Gdb-patches wrote:
>
>> What happens is:
>>
>>  - After doing the "continue" on inferior 1, the remote target gives us
>>    a VFORK_DONE event.  The core ignores it and resumes inferior 1.
>>  - Since prune_inferiors is now called after each handled event, in
>>    fetch_inferior_event, it is called after we handled that VFORK_DONE
>>    event and resumed inferior 1.
>>  - Inferior 2 is pruned, which (see backtrace above) causes its program
>>    space to be deleted, which clears the symtabs for that program space,
>>    which calls the new_objfile observable and remote_new_objfile
>>    observer (with a nullptr objfile, to indicate that the previously
>>    loaded symbols have been discarded), which calls
>>    remote_check_symbols.
>>
>> remote_check_symbols is the function that sends the qSymbol packet, to
>> let the remote side ask for symbol addresses.  The problem is that the
>> remote target is working in all-stop / sync mode and is currently
>> resumed.  It has sent a vCont packet to resume the target and is waiting
>> for a stop reply.  It can't send any packets in the mean time.  That
>> causes the exception to be thrown.
>
> The inferiors that are pruned, their corresponding processes are gone, otherwise
> they wouldn't be pruned.  So it seems wrong to even send a qSymbol packet at all,
> as qSymbol on the remote end will be interpreted as "new symbols in the current process",
> and the remote current process will have nothing to do with the pruned inferior.
>
> Note this comment and target_has_execution check:
>
>  /* Symbol look-up.  */
>
>  void
>  remote_target::remote_check_symbols ()
>  {
>    char *tmp;
>    int end;
>
>    /* The remote side has no concept of inferiors that aren't running
>       yet, it only knows about running processes.  If we're connected
>       but our current inferior is not running, we should not invite the
>       remote target to request symbol lookups related to its
>       (unrelated) current process.  */
>    if (!target_has_execution ())
>      return;
>
> So if we're getting past this check, it seems to me that this means that we're calling
> remote_check_symbols in the context of some other inferior other than the one that
> is being pruned, which seems to be to be the bug here.

I see.  Current inferior is 1 (the one that remains).

This is tricky, because we are here in reaction to deleting the program
space, which is done in reaction to deleting the last inferior of that
program space.  Ideally, we would make inferior 2 (the one being
deleted) current somewhere, so that target_has_execution would see that
we have no execution.  I tried using
switch_to_program_space_and_thread in ~program_space instead of
set_current_program_space (so it would set the current inferior too),
but that doesn't work as inferior 2 is already removed from the inferior
list.

Since inferiors can share a program space, perhaps the call to
target_has_execution here is just wrong, since it only checks one (the
current) inferior.  What if you have two inferiors sharing a program
space, one with execution and one without.  If remote_new_objfile gets
called with the one without execution as the current inferior, we'll
conclude we have no execution and won't send qSymbol.  But maybe we
should have sent it, for the benefit of the inferior that has execution.

So maybe we need to change the question from "does the current inferior
have execution" to "does any inferior in the affected program space have
execution", since the current inferior is not always reliable.  And
anyway, an objfile change is pspace-specific, not inferior-specific.

Something like that:

diff --git a/gdb/remote.c b/gdb/remote.c
index ff98024cd78..9888aab88bd 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5116,14 +5116,6 @@ remote_target::remote_check_symbols ()
   char *tmp;
   int end;

-  /* The remote side has no concept of inferiors that aren't running
-     yet, it only knows about running processes.  If we're connected
-     but our current inferior is not running, we should not invite the
-     remote target to request symbol lookups related to its
-     (unrelated) current process.  */
-  if (!target_has_execution ())
-    return;
-
   if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
     return;

@@ -14595,6 +14587,22 @@ remote_new_objfile (struct objfile *objfile)
   if (current_inferior ()->in_initial_library_scan)
     return;

+  bool has_execution = false;
+  program_space *pspace = current_program_space;
+  for (inferior *inf : all_inferiors (remote))
+    {
+      if (inf->pspace != pspace)
+       continue;
+
+      has_execution = remote->has_execution (inf);
+
+      if (has_execution)
+       break;
+    }
+
+  if (!has_execution)
+    return;
+
   remote->remote_check_symbols ();
 }

Note that here, all_inferiors(remote) actually only iterates on inferior
1, as inferior 2 is already deleted at that point.  But it fixes the
test.

> Note a bit below that function does:
>
>   /* Make sure the remote is pointing at the right process.  Note
>      there's no way to select "no process".  */
>   set_general_process ();



>
> ... again, this is going to set the remote focus on gdb's current inferior,
> but that inferior has no relation to the inferior that was pruned, so the
> remote end is going to be querying symbols for a process that hasn't
> really changed its symbols.

... and since the current program space in GDB isn't the one connected
to that inferior, the remote is probably going to receive wrong
answers.

That made me think: I know we have an invariant that "there is always a
current inferior", but I think this is a case where it would be useful
to have no current inferior, in ~program_space.  We set the current
program space in ~program_space, which makes sense because that tells
the callees what program_space is being changed.  But the current
inferior will always mismatch the current program space at this point.
So any code using the current inferior (whether it is to do some target
call or something else) will probably always be wrong.

>> This wasn't a problem before, when prune_inferiors was called in
>> normal_stop, because it was always called at a time the target was not
>> resumed.
>>
>> I realized there are simpler ways to trigger this problem.  You only
>> need to load a symbol file (which triggers remote_check_symbols) while
>> sync execution is ongoing:
>>
>>     $ ./gdb -q --data-directory=data-directory -nx a.out \
>>           -ex "tar ext :1234" \
>> 	  -ex "tb main" \
>> 	  -ex c \
>> 	  -ex "c&" \
>> 	  -ex "add-inferior" \
>> 	  -ex "inferior 2" \
>> 	  -ex "file /bin/ls"
>>     * aborts because of the same exception *
>>
>
> This is a valid problem, but a different problem, IMO.

I think you're right.

Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-29 14:53     ` Simon Marchi
@ 2022-04-29 15:39       ` Pedro Alves
  2022-04-29 15:56         ` Simon Marchi
  2022-04-29 15:47       ` Pedro Alves
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2022-04-29 15:39 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-04-29 15:53, Simon Marchi wrote:
> On 2022-04-29 08:50, Pedro Alves wrote:

> Since inferiors can share a program space, perhaps the call to
> target_has_execution here is just wrong, since it only checks one (the
> current) inferior.  What if you have two inferiors sharing a program
> space, one with execution and one without.  If remote_new_objfile gets
> called with the one without execution as the current inferior, we'll
> conclude we have no execution and won't send qSymbol.  But maybe we
> should have sent it, for the benefit of the inferior that has execution.
> 
> So maybe we need to change the question from "does the current inferior
> have execution" to "does any inferior in the affected program space have
> execution", since the current inferior is not always reliable.  And
> anyway, an objfile change is pspace-specific, not inferior-specific.

I think so, yes.

> 
> Something like that:
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index ff98024cd78..9888aab88bd 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -5116,14 +5116,6 @@ remote_target::remote_check_symbols ()
>    char *tmp;
>    int end;
> 
> -  /* The remote side has no concept of inferiors that aren't running
> -     yet, it only knows about running processes.  If we're connected
> -     but our current inferior is not running, we should not invite the
> -     remote target to request symbol lookups related to its
> -     (unrelated) current process.  */
> -  if (!target_has_execution ())
> -    return;
> -
>    if (packet_support (PACKET_qSymbol) == PACKET_DISABLE)
>      return;
> 
> @@ -14595,6 +14587,22 @@ remote_new_objfile (struct objfile *objfile)
>    if (current_inferior ()->in_initial_library_scan)
>      return;
> 
> +  bool has_execution = false;
> +  program_space *pspace = current_program_space;
> +  for (inferior *inf : all_inferiors (remote))
> +    {
> +      if (inf->pspace != pspace)
> +       continue;
> +
> +      has_execution = remote->has_execution (inf);
> +
> +      if (has_execution)
> +       break;
> +    }
> +
> +  if (!has_execution)
> +    return;
> +
>    remote->remote_check_symbols ();
>  }
> 

remote_new_objfile has another reference to current_inferior():

 /* Function to be called whenever a new objfile (shlib) is detected.  */
 static void
 remote_new_objfile (struct objfile *objfile)
 {
 ...
   if (current_inferior ()->in_initial_library_scan)
     return;

   remote->remote_check_symbols ();
 }

so seems to me that it should be here that we'd loop over inferiors.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-29 14:53     ` Simon Marchi
  2022-04-29 15:39       ` Pedro Alves
@ 2022-04-29 15:47       ` Pedro Alves
  1 sibling, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2022-04-29 15:47 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-04-29 15:53, Simon Marchi wrote:
> That made me think: I know we have an invariant that "there is always a
> current inferior", but I think this is a case where it would be useful
> to have no current inferior, in ~program_space.  We set the current
> program space in ~program_space, which makes sense because that tells
> the callees what program_space is being changed.  But the current
> inferior will always mismatch the current program space at this point.
> So any code using the current inferior (whether it is to do some target
> call or something else) will probably always be wrong.

Seems worth a try.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply
  2022-04-29 15:39       ` Pedro Alves
@ 2022-04-29 15:56         ` Simon Marchi
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Marchi @ 2022-04-29 15:56 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches


> remote_new_objfile has another reference to current_inferior():
>
>  /* Function to be called whenever a new objfile (shlib) is detected.  */
>  static void
>  remote_new_objfile (struct objfile *objfile)
>  {
>  ...
>    if (current_inferior ()->in_initial_library_scan)
>      return;
>
>    remote->remote_check_symbols ();
>  }
>
> so seems to me that it should be here that we'd loop over inferiors.

Ok, I'll make a v2 with that.

Simon


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-29 15:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-24  3:20 [PATCH 0/2] Fix regressions caused by prune_threads patch Simon Marchi
2022-04-24  3:20 ` [PATCH 1/2] gdb/infrun: make fetch_inferior_event restore thread if exited or signalled Simon Marchi
2022-04-25 17:21   ` Andrew Burgess
2022-04-26 12:43     ` Simon Marchi
2022-04-29 12:37   ` Pedro Alves
2022-04-29 13:25     ` Simon Marchi
2022-04-24  3:20 ` [PATCH 2/2] gdb/remote: return early from remote_check_symbols if waiting for stop reply Simon Marchi
2022-04-26 10:27   ` Andrew Burgess
2022-04-26 14:15     ` Simon Marchi
2022-04-27 10:11       ` Andrew Burgess
2022-04-27 14:01         ` Simon Marchi
2022-04-29 12:50   ` Pedro Alves
2022-04-29 14:53     ` Simon Marchi
2022-04-29 15:39       ` Pedro Alves
2022-04-29 15:56         ` Simon Marchi
2022-04-29 15:47       ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).