[PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
@ 2020-07-06 19:02 Pedro Alves
  2020-07-06 19:02 ` [PATCH 1/7] Fix spurious unhandled remote %Stop notifications Pedro Alves
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

This patch series fixes PR 26199, a GDB 10 blocker.

I discussed how to fix this with Simon, and we came to the conclusion
that we can fix it by removing code.  Easy.  :-) That's the last patch
in the series.

Well, not so easy, actually... :-/

Doing that alone caused gdb.multi/multi-target.exp to regress.  And
the reason was that the fix for PR 261299 made that testcase trip on a
few latent bugs...  One of those bugs also caused a similar 100% cpu
busy loop.

And then, while fixing those, I added a new test scenario to
gdb.multi/multi-target.exp to exercise the TARGET_WAITKIND_NO_RESUMED
handling fixes in this series.  That new test requires sending Ctrl-C
to GDB after the test is done with, in order to cleanly kill gdbserver
via "monitor exit".  But, that Ctrl-C didn't work, due to an issue
with GDB's terminal handling, GDB would just hang...

That's all fixed by this series.

Pedro Alves (6):
  Fix spurious unhandled remote %Stop notifications
  Fix latent bug in target_pass_ctrlc
  Avoid constant stream of TARGET_WAITKIND_NO_RESUMED
  Fix handle_no_resumed w/ multiple targets
  Make handle_no_resumed transfer terminal
  Testcase for previous handle_no_resumed fixes

Simon Marchi (1):
  Fix GDB busy loop when interrupting non-stop program (PR 26199)

 gdb/infrun.c                             | 116 +++++++++++++++++++++----------
 gdb/remote.c                             |  15 +++-
 gdb/target.c                             |   2 +-
 gdb/testsuite/gdb.multi/multi-target.c   |   4 +-
 gdb/testsuite/gdb.multi/multi-target.exp |  76 ++++++++++++++++++++
 5 files changed, 173 insertions(+), 40 deletions(-)

base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
-- 
2.14.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] Fix spurious unhandled remote %Stop notifications
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-12-12 22:13   ` Andrew Burgess
  2020-07-06 19:02 ` [PATCH 2/7] Fix latent bug in target_pass_ctrlc Pedro Alves
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

In non-stop mode, remote targets mark an async event source whose
callback is supposed to result in calling remote_target::wait_ns to
either process the event queue, or acknowledge an incoming %Stop
notification.

The callback in question is remote_async_inferior_event_handler, where
we call inferior_event_handler, to end up in fetch_inferior_event ->
target_wait -> remote_target::wait -> remote_target::wait_ns.

A problem here however is that when debugging multiple targets,
fetch_inferior_event can pull events out of any target picked at
random, for event fairness.  This means that when
remote_async_inferior_event_handler returns, remote_target::wait may
have not been called at all, and thus pending notifications may have
not been acked.  Because async event sources auto-clear, when
remote_async_inferior_event_handler returns the async event handler is
no longer marked, so the event loop won't automatically call
remote_async_inferior_event_handler again to try to process the
pending remote notifications/queue.  The result is that stop events
may end up not processed, e.g., "interrupt -a" seemingly not managing
to stop all threads.

Fix this by making remote_async_inferior_event_handler mark the event
handler again before returning, if necessary.

Maybe a better fix would be to make async event handlers not
auto-clear themselves, make that the responsibility of the callback,
so that the event loop would keep calling the callback automatically.
Or, we could try making so that fetch_inferior_event would optionally
handle events only for the target that it got passed down via
parameter.  However, I don't think now just before branching is the
time to try to do any such change.

gdb/ChangeLog:

	PR gdb/26199
	* remote.c (remote_target::open_1): Pass remote target pointer as
	data to create_async_event_handler.
	(remote_async_inferior_event_handler): Mark async event handler
	before returning if the remote target still has either pending
	events or unacknowledged notifications.
---
 gdb/remote.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index f7f99dc24f..59075cb09f 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5605,7 +5605,7 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)

   /* Register extra event sources in the event loop.  */
   rs->remote_async_inferior_event_token
-    = create_async_event_handler (remote_async_inferior_event_handler, NULL);
+    = create_async_event_handler (remote_async_inferior_event_handler, remote);
   rs->notif_state = remote_notif_state_allocate (remote);

   /* Reset the target state; these things will be queried either by
@@ -14164,6 +14164,19 @@ static void
 remote_async_inferior_event_handler (gdb_client_data data)
 {
   inferior_event_handler (INF_REG_EVENT);
+
+  remote_target *remote = (remote_target *) data;
+  remote_state *rs = remote->get_remote_state ();
+
+  /* inferior_event_handler may have consumed an event pending on the
+     infrun side without calling target_wait on the REMOTE target, or
+     may have pulled an event out of a different target.  Keep trying
+     for this remote target as long it still has either pending events
+     or unacknowledged notifications.  */
+
+  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
+      || !rs->stop_reply_queue.empty ())
+    mark_async_event_handler (rs->remote_async_inferior_event_token);
 }

 int
-- 
2.14.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/7] Fix latent bug in target_pass_ctrlc
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
  2020-07-06 19:02 ` [PATCH 1/7] Fix spurious unhandled remote %Stop notifications Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 19:02 ` [PATCH 3/7] Avoid constant stream of TARGET_WAITKIND_NO_RESUMED Pedro Alves
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

We were checking the thr->executing of an exited thread.

gdb/ChangeLog:

	PR gdb/26199
	* target.c (target_pass_ctrlc): Looking at the inferiors
	non-exited threads, not all threads.
---
 gdb/target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdb/target.c b/gdb/target.c
index f4e4f05b5f..cd66675e8a 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -3274,7 +3274,7 @@ target_pass_ctrlc (void)
       if (proc_target == NULL)
 	continue;
 
-      for (thread_info *thr : inf->threads ())
+      for (thread_info *thr : inf->non_exited_threads ())
 	{
 	  /* A thread can be THREAD_STOPPED and executing, while
 	     running an infcall.  */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/7] Avoid constant stream of TARGET_WAITKIND_NO_RESUMED
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
  2020-07-06 19:02 ` [PATCH 1/7] Fix spurious unhandled remote %Stop notifications Pedro Alves
  2020-07-06 19:02 ` [PATCH 2/7] Fix latent bug in target_pass_ctrlc Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 19:02 ` [PATCH 4/7] Fix handle_no_resumed w/ multiple targets Pedro Alves
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

If we hit the synchronous execution command case described by
handle_no_resumed, and handle_no_resumed determines that the event
should be ignored, because it found a thread that is executing, we end
up in prepare_to_wait.

There, if the current target is not registered in the event loop right
now, we call mark_infrun_async_event_handler.  With that event handler
marked, the event loop calls again into fetch_inferior_event, which
calls target_wait, which returns TARGET_WAITKIND_NO_RESUMED, and we
end up in handle_no_resumed, again ignoring the event and marking
infrun_async_event_handler.  The result is that GDB is now always
keeping the CPU 100% busy in this loop, even though it continues to be
able to react to input and to real target events, because we still go
through the event-loop.

The problem is that marking of the infrun_async_event_handler in
prepare_to_wait.  That is there to handle targets that don't support
asynchronous execution.  So the correct predicate is whether async
execution is supported, not whether the target is async right now.

gdb/ChangeLog:

	PR gdb/26199
	* infrun.c (prepare_to_wait): Check target_can_async_p instead of
	target_is_async_p.
---
 gdb/infrun.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 6b655d4430..a01e0969cb 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -8116,7 +8116,11 @@ prepare_to_wait (struct execution_control_state *ecs)

   ecs->wait_some_more = 1;

-  if (!target_is_async_p ())
+  /* If the target can't async, emulate it by marking the infrun event
+     handler such that as soon as we get back to the event-loop, we
+     immediately end up in fetch_inferior_event again calling
+     target_wait.  */
+  if (!target_can_async_p ())
     mark_infrun_async_event_handler ();
 }

-- 
2.14.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/7] Fix handle_no_resumed w/ multiple targets
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (2 preceding siblings ...)
  2020-07-06 19:02 ` [PATCH 3/7] Avoid constant stream of TARGET_WAITKIND_NO_RESUMED Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 19:02 ` [PATCH 5/7] Make handle_no_resumed transfer terminal Pedro Alves
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

handle_no_resumed is currently not considering multiple targets.

Say you have two inferiors 1 and 2, each connected to a different
target, A and B.

Now say you set inferior 2 running, with "continue &".

Now you select a thread of inferior 1, say thread 1.2, and continue in
the foreground.  All other threads of inferior 1 are left stopped.
Thread 1.2 exits, and thus target A has no other resumed thread, so it
reports TARGET_WAITKIND_NO_RESUMED.

At this point, if both inferiors were running in the same target,
handle_no_resumed would realize that threads of inferior 2 are still
executing, so the TARGET_WAITKIND_NO_RESUMED event should be ignored.
But because handle_no_resumed only walks the threads of the current
target, it misses noticing that threads of inferior 2 are still
executing.  The fix is just to walk over all threads of all targets.

A testcase covering the use case above will be added in a following
patch.  It can't be added yet because it depends on yet another fix to
handle_no_resumed not included here.

gdb/ChangeLog:

	PR gdb/26199
	* infrun.c (handle_no_resumed): Handle multiple targets.
---
 gdb/infrun.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index a01e0969cb..0f2f45a4d2 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5068,16 +5068,28 @@ handle_no_resumed (struct execution_control_state *ecs)
      have resumed threads _now_.  In the example above, this removes
      thread 3 from the thread list.  If thread 2 was re-resumed, we
      ignore this event.  If we find no thread resumed, then we cancel
-     the synchronous command show "no unwaited-for " to the user.  */
-  update_thread_list ();
+     the synchronous command and show "no unwaited-for " to the
+     user.  */
+
+  {
+    scoped_restore_current_thread restore_thread;
+
+    for (auto *target : all_non_exited_process_targets ())
+      {
+	switch_to_target_no_thread (target);
+	update_thread_list ();
+      }
+  }

-  for (thread_info *thread : all_non_exited_threads (ecs->target))
+  for (thread_info *thread : all_non_exited_threads ())
     {
       if (thread->executing
 	  || thread->suspend.waitstatus_pending_p)
 	{
-	  /* There were no unwaited-for children left in the target at
-	     some point, but there are now.  Just ignore.  */
+	  /* Either there were no unwaited-for children left in the
+	     target at some point, but there are now, or some target
+	     other than the eventing one has unwaited-for children
+	     left.  Just ignore.  */
 	  if (debug_infrun)
 	    fprintf_unfiltered (gdb_stdlog,
 				"infrun: TARGET_WAITKIND_NO_RESUMED "
-- 
2.14.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/7] Make handle_no_resumed transfer terminal
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (3 preceding siblings ...)
  2020-07-06 19:02 ` [PATCH 4/7] Fix handle_no_resumed w/ multiple targets Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 19:02 ` [PATCH 6/7] Testcase for previous handle_no_resumed fixes Pedro Alves
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

Let's consider the same use case as in the previous commit:

Say you have two inferiors 1 and 2, each connected to a different
target, A and B.

Now say you set inferior 2 running, with "continue &".

Now you select a thread of inferior 1, say thread 1.2, and continue in
the foreground.  All other threads of inferior 1 are left stopped.
Thread 1.2 exits, and thus target A has no other resumed thread, so it
reports TARGET_WAITKIND_NO_RESUMED.

At this point, because the threads of inferior 2 are still executing
the TARGET_WAITKIND_NO_RESUMED event is ignored.

Now, the user types Ctrl-C.  Because GDB had previously put inferior 1
in the foreground, the kernel sends the SIGINT to that inferior.
However, no thread in that inferior is executing right now, so ptrace
never intercepts the SIGINT -- it is never dequeued by any thread.
The result is that GDB's CLI is stuck.  There's no way to get back the
prompt (unless inferior 2 happens to report some event).

The fix in this commit is to make handle_no_resumed give the terminal
to some other inferior that still has threads executing so that a
subsequent Ctrl-C reaches that target first (and then GDB intercepts
the SIGINT).  This is a bit hacky, but seems like the best we can do
with the current design.

I think that putting all native inferiors in their own session would
help fixing this in a clean way, since with that a Ctrl-C on GDB's
terminal will _always_ reach GDB first, and then GDB can decide how to
pause the inferior.  But that's a much larger change.

The testcase added by the following patch needs this fix.

gdb/ChangeLog:

	PR gdb/26199
	* infrun.c (handle_no_resumed): Transfer terminal to inferior with
	executing threads.
---
 gdb/infrun.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 12 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 0f2f45a4d2..158b199069 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5071,20 +5071,52 @@ handle_no_resumed (struct execution_control_state *ecs)
      the synchronous command and show "no unwaited-for " to the
      user.  */

-  {
-    scoped_restore_current_thread restore_thread;
+  inferior *curr_inf = current_inferior ();

-    for (auto *target : all_non_exited_process_targets ())
-      {
-	switch_to_target_no_thread (target);
-	update_thread_list ();
-      }
-  }
+  scoped_restore_current_thread restore_thread;
+
+  for (auto *target : all_non_exited_process_targets ())
+    {
+      switch_to_target_no_thread (target);
+      update_thread_list ();
+    }
+
+  /* If:
+
+       - the current target has no thread executing, and
+       - the current inferior is native, and
+       - the current inferior is the one which has the terminal, and
+       - we did nothing,
+
+     then a Ctrl-C from this point on would remain stuck in the
+     kernel, until a thread resumes and dequeues it.  That would
+     result in the GDB CLI not reacting to Ctrl-C, not able to
+     interrupt the program.  To address this, if the current inferior
+     no longer has any thread executing, we give the terminal to some
+     other inferior that has at least one thread executing.  */
+  bool swap_terminal = true;
+
+  /* Whether to ignore this TARGET_WAITKIND_NO_RESUMED event, or
+     whether to report it to the user.  */
+  bool ignore_event = false;

   for (thread_info *thread : all_non_exited_threads ())
     {
-      if (thread->executing
-	  || thread->suspend.waitstatus_pending_p)
+      if (swap_terminal && thread->executing)
+	{
+	  if (thread->inf != curr_inf)
+	    {
+	      target_terminal::ours ();
+
+	      switch_to_thread (thread);
+	      target_terminal::inferior ();
+	    }
+	  swap_terminal = false;
+	}
+
+      if (!ignore_event
+	  && (thread->executing
+	      || thread->suspend.waitstatus_pending_p))
 	{
 	  /* Either there were no unwaited-for children left in the
 	     target at some point, but there are now, or some target
@@ -5094,9 +5126,19 @@ handle_no_resumed (struct execution_control_state *ecs)
 	    fprintf_unfiltered (gdb_stdlog,
 				"infrun: TARGET_WAITKIND_NO_RESUMED "
 				"(ignoring: found resumed)\n");
-	  prepare_to_wait (ecs);
-	  return 1;
+
+	  ignore_event = true;
 	}
+
+      if (ignore_event && !swap_terminal)
+	break;
+    }
+
+  if (ignore_event)
+    {
+      switch_to_inferior_no_thread (curr_inf);
+      prepare_to_wait (ecs);
+      return 1;
     }

   /* Go ahead and report the event.  */
-- 
2.14.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/7] Testcase for previous handle_no_resumed fixes
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (4 preceding siblings ...)
  2020-07-06 19:02 ` [PATCH 5/7] Make handle_no_resumed transfer terminal Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 19:02 ` [PATCH 7/7] Fix GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

This adds a testcase that covers the scenarios described in the
previous two commits.

gdb/testsuite/ChangeLog:

	PR gdb/26199
	* gdb.multi/multi-target.c (exit_thread): New.
	(thread_start): Break loop if EXIT_THREAD.
	* gdb.multi/multi-target.exp (test_no_unwaited_for): New proc.
	(top level) Call test_no_resumed.
---
 gdb/testsuite/gdb.multi/multi-target.c   |  4 +-
 gdb/testsuite/gdb.multi/multi-target.exp | 76 ++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/gdb/testsuite/gdb.multi/multi-target.c b/gdb/testsuite/gdb.multi/multi-target.c
index 23ec6bd600..b337e59a11 100644
--- a/gdb/testsuite/gdb.multi/multi-target.c
+++ b/gdb/testsuite/gdb.multi/multi-target.c
@@ -26,12 +26,14 @@
 
 static pthread_barrier_t barrier;
 
+volatile int exit_thread;
+
 static void *
 thread_start (void *arg)
 {
   pthread_barrier_wait (&barrier);
 
-  while (1)
+  while (!exit_thread)
     sleep (1);
   return NULL;
 }
diff --git a/gdb/testsuite/gdb.multi/multi-target.exp b/gdb/testsuite/gdb.multi/multi-target.exp
index b519eda4e8..d19cee6595 100644
--- a/gdb/testsuite/gdb.multi/multi-target.exp
+++ b/gdb/testsuite/gdb.multi/multi-target.exp
@@ -439,6 +439,77 @@ proc test_info_inferiors {multi_process} {
     }
 }
 
+# Test that when there's a foreground execution command in progress, a
+# TARGET_WAITKIND_NO_RESUMED for a particular target is ignored when
+# other targets are still resumed.
+
+proc test_no_resumed {} {
+    proc test_no_resumed_infs {inf_A inf_B} {
+	global gdb_prompt
+
+	if {![setup "off"]} {
+	    untested "setup failed"
+	    return
+	}
+
+	gdb_test "thread $inf_A.2" "Switching to thread $inf_A\.2 .*" \
+	    "select thread of target A"
+
+	gdb_test_no_output "set scheduler-locking on"
+
+	gdb_test_multiple "continue &" "" {
+	    -re "Continuing.*$gdb_prompt " {
+		pass $gdb_test_name
+	    }
+	}
+
+	gdb_test "thread $inf_B.2" "Switching to thread $inf_B\.2 .*" \
+	    "select thread of target B"
+	gdb_test "p exit_thread = 1" " = 1" \
+	    "set the thread to exit on resumption"
+
+	# Wait 3 seconds.  If we see any response from GDB, such as
+	# "No unwaited-for children left." it's a bug.
+	gdb_test_multiple "continue" "continue" {
+	    -timeout 3
+	    timeout {
+		pass $gdb_test_name
+	    }
+	}
+
+	# Now stop the program (all targets).
+	send_gdb "\003"
+	gdb_test_multiple "" "send_gdb control C" {
+	    -re "received signal SIGINT.*$gdb_prompt $" {
+		pass $gdb_test_name
+	    }
+	}
+
+	gdb_test_multiple "info threads" "all threads stopped" {
+	    -re "\\\(running\\\).*$gdb_prompt $" {
+		fail $gdb_test_name
+	    }
+	    -re "$gdb_prompt $" {
+		pass $gdb_test_name
+	    }
+	}
+    }
+
+    # inferior 1 -> native
+    # inferior 2 -> extended-remote 1
+    # inferior 5 -> extended-remote 2
+    set inferiors {1 2 5}
+    foreach_with_prefix inf_A $inferiors {
+	foreach_with_prefix inf_B $inferiors {
+	    if {$inf_A == $inf_B} {
+		continue
+	    }
+	    test_no_resumed_infs $inf_A $inf_B
+	}
+    }
+}
+
+
 # Make a core file with two threads upfront.  Several tests load the
 # same core file.
 prepare_core
@@ -467,4 +538,9 @@ with_test_prefix "info-inferiors" {
     }
 }
 
+# Test TARGET_WAITKIND_NO_RESUMED handling with multiple targets.
+with_test_prefix "no-resumed" {
+    test_no_resumed
+}
+
 cleanup_gdbservers
-- 
2.14.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/7] Fix GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (5 preceding siblings ...)
  2020-07-06 19:02 ` [PATCH 6/7] Testcase for previous handle_no_resumed fixes Pedro Alves
@ 2020-07-06 19:02 ` Pedro Alves
  2020-07-06 21:28 ` [PATCH 0/7] " Simon Marchi
  2020-07-10 23:02 ` Pedro Alves
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-06 19:02 UTC (permalink / raw)
  To: gdb-patches

From: Simon Marchi <simon.marchi@polymtl.ca>

When interrupting a program in non-stop, the program gets interrupted
correctly, but GDB busy loops (the event loop is always woken up).

Here is how to reproduce it:

 1. Start GDB: ./gdb -nx --data-directory=data-directory -ex "set non-stop 1" --args  /bin/sleep 60
 2. Run the program with "run"
 3. Interrupt with ^C.
 4. Look into htop, see GDB taking 100% CPU

Debugging `handle_file_event`, we see that the event source that wakes
up the event loop is the linux-nat one:

 (top-gdb) p file_ptr.proc
 $5 = (handler_func *) 0xb9cccd <handle_target_event(int, gdb_client_data)>
				 ^^^^^^^^^^^^^^^^^^^
					 |
					 \-- the linux-nat callback

Debugging fetch_inferior_event and do_target_wait, we see that we
don't actually call `wait` on the linux-nat target, because
inferior_matches returns false:

 auto inferior_matches = [&wait_ptid] (inferior *inf)
   {
     return (inf->process_target () != NULL
	     && (threads_are_executing (inf->process_target ())
		 || threads_are_resumed_pending_p (inf))
	     && ptid_t (inf->pid).matches (wait_ptid));
   };

because `threads_are_executing` is false.

What happens is:

 1. User types ctrl-c, that writes in the linux-nat pipe, waking up
    the event source.

 2. linux-nat's wait gets called, the SIGINT event is returned, but
    before returning, it marks the pipe again, in order for wait to
    get called again:

    /* If we requested any event, and something came out, assume there
       may be more.  If we requested a specific lwp or process, also
       assume there may be more.  */
    if (target_is_async_p ()
	&& ((ourstatus->kind != TARGET_WAITKIND_IGNORE
	     && ourstatus->kind != TARGET_WAITKIND_NO_RESUMED)
	    || ptid != minus_one_ptid))
      async_file_mark ();

 3. The SIGINT event is handled, the program is stopped, the stop
    notification is printed.

 4. The event loop is woken up again because of the `async_file_mark`
    of step 2.

 5. Because `inferior_matches` returns false, we never call
    linux-nat's wait, so the pipe stays readable.

 6. Goto 4.

Pedro says:

This commit fixes it by letting do_target_wait call target_wait even
if threads_are_executing is false.  This will normally result in the
target returning TARGET_WAITKIND_NO_RESUMED, and _not_ marking its
event source again.  This results in infrun only calling into the
target only once (i.e., breaking the busy loop).

Note that the busy loop bug didn't trigger in all-stop mode because
all-stop handles this by unregistering the target from the event loop
as soon as it was all stopped -- see
inf-loop.c:inferior_event_handler's INF_EXEC_COMPLETE handling.  If we
remove that non-stop check from inferior_event_handler, and replace
the target_has_execution check for threads_are_executing instead, it
also fixes the issue for non-stop.  I considered that as the final
solution, but decided that the solution proposed here instead is just
simpler and more future-proof design.  With the
TARGET_WAITKIND_NO_RESUMED handling fixes done in the previous
patches, I think it should be possible to always keep the target
registered in the event loop, meaning we could eliminate the
target_async(0) call from inferior_event_handler as well as most of
the target_async(1) calls in the target backends.  That would allow in
the future e.g., the remote target reporting asynchronous
notifications even if all threads are stopped.  I haven't attempted
that, though.

gdb/ChangeLog:
yyyy-mm-dd  Simon Marchi  <simon.marchi@polymtl.ca>
	    Pedro Alves  <pedro@palves.net>

	PR gdb/26199
	* infrun.c (threads_are_resumed_pending_p): Delete.
	(do_target_wait): Remove threads_are_executing and
	threads_are_resumed_pending_p checks from the inferior_matches
	lambda.  Update comments.
---
 gdb/infrun.c | 38 +++++++++++---------------------------
 1 file changed, 11 insertions(+), 27 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 158b199069..31266109a6 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3601,23 +3601,9 @@ do_target_wait_1 (inferior *inf, ptid_t ptid,
   return event_ptid;
 }
 
-/* Returns true if INF has any resumed thread with a status
-   pending.  */
-
-static bool
-threads_are_resumed_pending_p (inferior *inf)
-{
-  for (thread_info *tp : inf->non_exited_threads ())
-    if (tp->resumed
-	&& tp->suspend.waitstatus_pending_p)
-      return true;
-
-  return false;
-}
-
 /* Wrapper for target_wait that first checks whether threads have
    pending statuses to report before actually asking the target for
-   more events. Polls for events from all inferiors/targets.  */
+   more events.  Polls for events from all inferiors/targets.  */
 
 static bool
 do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
@@ -3625,20 +3611,18 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
   int num_inferiors = 0;
   int random_selector;
 
-  /* For fairness, we pick the first inferior/target to poll at
-     random, and then continue polling the rest of the inferior list
-     starting from that one in a circular fashion until the whole list
-     is polled once.  */
+  /* For fairness, we pick the first inferior/target to poll at random
+     out of all inferiors that may report events, and then continue
+     polling the rest of the inferior list starting from that one in a
+     circular fashion until the whole list is polled once.  */
 
   auto inferior_matches = [&wait_ptid] (inferior *inf)
     {
       return (inf->process_target () != NULL
-	      && (threads_are_executing (inf->process_target ())
-		  || threads_are_resumed_pending_p (inf))
 	      && ptid_t (inf->pid).matches (wait_ptid));
     };
 
-  /* First see how many resumed inferiors we have.  */
+  /* First see how many matching inferiors we have.  */
   for (inferior *inf : all_inferiors ())
     if (inferior_matches (inf))
       num_inferiors++;
@@ -3649,7 +3633,7 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
       return false;
     }
 
-  /* Now randomly pick an inferior out of those that were resumed.  */
+  /* Now randomly pick an inferior out of those that matched.  */
   random_selector = (int)
     ((num_inferiors * (double) rand ()) / (RAND_MAX + 1.0));
 
@@ -3658,7 +3642,7 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
 			"infrun: Found %d inferiors, starting at #%d\n",
 			num_inferiors, random_selector);
 
-  /* Select the Nth inferior that was resumed.  */
+  /* Select the Nth inferior that matched.  */
 
   inferior *selected = nullptr;
 
@@ -3670,7 +3654,7 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
 	  break;
 	}
 
-  /* Now poll for events out of each of the resumed inferior's
+  /* Now poll for events out of each of the matching inferior's
      targets, starting from the selected one.  */
 
   auto do_wait = [&] (inferior *inf)
@@ -3680,8 +3664,8 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs, int options)
     return (ecs->ws.kind != TARGET_WAITKIND_IGNORE);
   };
 
-  /* Needed in all-stop+target-non-stop mode, because we end up here
-     spuriously after the target is all stopped and we've already
+  /* Needed in 'all-stop + target-non-stop' mode, because we end up
+     here spuriously after the target is all stopped and we've already
      reported the stop to the user, polling for events.  */
   scoped_restore_current_thread restore_thread;
 
-- 
2.14.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (6 preceding siblings ...)
  2020-07-06 19:02 ` [PATCH 7/7] Fix GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
@ 2020-07-06 21:28 ` Simon Marchi
  2020-07-07  0:25   ` Pedro Alves
  2020-07-10 23:02 ` Pedro Alves
  8 siblings, 1 reply; 15+ messages in thread
From: Simon Marchi @ 2020-07-06 21:28 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]

On 2020-07-06 3:02 p.m., Pedro Alves wrote:
> This patch series fixes PR 26199, a GDB 10 blocker.
> 
> I discussed how to fix this with Simon, and we came to the conclusion
> that we can fix it by removing code.  Easy.  :-) That's the last patch
> in the series.
> 
> Well, not so easy, actually... :-/
> 
> Doing that alone caused gdb.multi/multi-target.exp to regress.  And
> the reason was that the fix for PR 261299 made that testcase trip on a
> few latent bugs...  One of those bugs also caused a similar 100% cpu
> busy loop.
> 
> And then, while fixing those, I added a new test scenario to
> gdb.multi/multi-target.exp to exercise the TARGET_WAITKIND_NO_RESUMED
> handling fixes in this series.  That new test requires sending Ctrl-C
> to GDB after the test is done with, in order to cleanly kill gdbserver
> via "monitor exit".  But, that Ctrl-C didn't work, due to an issue
> with GDB's terminal handling, GDB would just hang...
> 
> That's all fixed by this series.
> 
> Pedro Alves (6):
>   Fix spurious unhandled remote %Stop notifications
>   Fix latent bug in target_pass_ctrlc
>   Avoid constant stream of TARGET_WAITKIND_NO_RESUMED
>   Fix handle_no_resumed w/ multiple targets
>   Make handle_no_resumed transfer terminal
>   Testcase for previous handle_no_resumed fixes
> 
> Simon Marchi (1):
>   Fix GDB busy loop when interrupting non-stop program (PR 26199)
> 
>  gdb/infrun.c                             | 116 +++++++++++++++++++++----------
>  gdb/remote.c                             |  15 +++-
>  gdb/target.c                             |   2 +-
>  gdb/testsuite/gdb.multi/multi-target.c   |   4 +-
>  gdb/testsuite/gdb.multi/multi-target.exp |  76 ++++++++++++++++++++
>  5 files changed, 173 insertions(+), 40 deletions(-)
> 
> 
> base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
> -- 
> 2.14.5
> 

I am not sure why I didn't see this earlier, but gdb.multi/multi-target.exp
fails for me with my ASan-enabled build, using current master.  There is
a use after free.  Do you see the same?  I've attached the ASan log.

Simon

[-- Attachment #2: asan.log --]
[-- Type: text/x-log, Size: 8581 bytes --]

==18555==ERROR: AddressSanitizer: heap-use-after-free on address 0x621004670aa8 at pc 0x0000007ab125 bp 0x7ffdecaecd20 sp 0x7ffdecaecd10
READ of size 4 at 0x621004670aa8 thread T0
    #0 0x7ab124 in dwarf2_frame_this_id /home/smarchi/src/binutils-gdb/gdb/dwarf2/frame.c:1228
    #1 0x983ec5 in compute_frame_id /home/smarchi/src/binutils-gdb/gdb/frame.c:550
    #2 0x9841ee in get_frame_id(frame_info*) /home/smarchi/src/binutils-gdb/gdb/frame.c:582
    #3 0x1093faa in scoped_restore_current_thread::scoped_restore_current_thread() /home/smarchi/src/binutils-gdb/gdb/thread.c:1462
    #4 0xaee5ba in fetch_inferior_event(void*) /home/smarchi/src/binutils-gdb/gdb/infrun.c:3968
    #5 0xaa990b in inferior_event_handler(inferior_event_type, void*) /home/smarchi/src/binutils-gdb/gdb/inf-loop.c:43
    #6 0xea61b6 in remote_async_serial_handler /home/smarchi/src/binutils-gdb/gdb/remote.c:14161
    #7 0xefca8a in run_async_handler_and_reschedule /home/smarchi/src/binutils-gdb/gdb/ser-base.c:137
    #8 0xefcd23 in fd_event /home/smarchi/src/binutils-gdb/gdb/ser-base.c:188
    #9 0x15a7416 in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:548
    #10 0x15a7c36 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:673
    #11 0x15a5dbb in gdb_do_one_event() /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:215
    #12 0xbfe62d in start_event_loop /home/smarchi/src/binutils-gdb/gdb/main.c:356
    #13 0xbfe935 in captured_command_loop /home/smarchi/src/binutils-gdb/gdb/main.c:416
    #14 0xc01d39 in captured_main /home/smarchi/src/binutils-gdb/gdb/main.c:1253
    #15 0xc01dc9 in gdb_main(captured_main_args*) /home/smarchi/src/binutils-gdb/gdb/main.c:1268
    #16 0x414ddd in main /home/smarchi/src/binutils-gdb/gdb/gdb.c:32
    #17 0x7f590110b82f in __libc_start_main ../csu/libc-start.c:291
    #18 0x414bd8 in _start (/home/smarchi/build/binutils-gdb/gdb/gdb+0x414bd8)

0x621004670aa8 is located 424 bytes inside of 4064-byte region [0x621004670900,0x6210046718e0)
freed by thread T0 here:
    #0 0x7f5903c42c7f in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10bc7f)
    #1 0x98fb0c in xfree<void> /home/smarchi/src/binutils-gdb/gdb/../gdbsupport/common-utils.h:62
    #2 0x160804b in call_freefun /home/smarchi/src/binutils-gdb/libiberty/obstack.c:103
    #3 0x1608a72 in _obstack_free /home/smarchi/src/binutils-gdb/libiberty/obstack.c:280
    #4 0x98af3b in reinit_frame_cache() /home/smarchi/src/binutils-gdb/gdb/frame.c:1864
    #5 0x109341c in switch_to_no_thread() /home/smarchi/src/binutils-gdb/gdb/thread.c:1301
    #6 0xad0cf2 in switch_to_inferior_no_thread(inferior*) /home/smarchi/src/binutils-gdb/gdb/inferior.c:612
    #7 0xe76c9f in remote_unpush_target /home/smarchi/src/binutils-gdb/gdb/remote.c:5521
    #8 0xe8d997 in remote_target::readchar(int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9138
    #9 0xe90082 in remote_target::getpkt_or_notif_sane_1(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int, int, int*) /home/smarchi/src/binutils-gdb/gdb/remote.c:9684
    #10 0xe90adc in remote_target::getpkt_sane(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9791
    #11 0xe8fe58 in remote_target::getpkt(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9624
    #12 0xe8c4b6 in remote_target::remote_read_bytes_1(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:8861
    #13 0xe8cd1f in remote_target::remote_read_bytes(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:8988
    #14 0xe96134 in remote_target::xfer_partial(target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:10988
    #15 0x104a76f in raw_memory_xfer_partial(target_ops*, unsigned char*, unsigned char const*, unsigned long, long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:918
    #16 0x104b65c in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:1148
    #17 0x104c4b0 in target_read_partial /home/smarchi/src/binutils-gdb/gdb/target.c:1379
    #18 0x104c68e in target_read(target_ops*, target_object, char const*, unsigned char*, unsigned long, long) /home/smarchi/src/binutils-gdb/gdb/target.c:1419
    #19 0x104bd7a in target_read_raw_memory(unsigned long, unsigned char*, long) /home/smarchi/src/binutils-gdb/gdb/target.c:1252
    #20 0x761b32 in dcache_read_line /home/smarchi/src/binutils-gdb/gdb/dcache.c:336
    #21 0x761ebf in dcache_peek_byte /home/smarchi/src/binutils-gdb/gdb/dcache.c:403
    #22 0x76244d in dcache_read_memory_partial(target_ops*, dcache_struct*, unsigned long, unsigned char*, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/dcache.c:484
    #23 0x104ae00 in memory_xfer_partial_1 /home/smarchi/src/binutils-gdb/gdb/target.c:1033
    #24 0x104b03d in memory_xfer_partial /home/smarchi/src/binutils-gdb/gdb/target.c:1076
    #25 0x104b5c7 in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:1133
    #26 0x11e85c3 in read_value_memory(value*, long, int, unsigned long, unsigned char*, unsigned long) /home/smarchi/src/binutils-gdb/gdb/valops.c:956
    #27 0x121eb88 in value_fetch_lazy_memory /home/smarchi/src/binutils-gdb/gdb/value.c:3764
    #28 0x121fa89 in value_fetch_lazy(value*) /home/smarchi/src/binutils-gdb/gdb/value.c:3910
    #29 0x1214bcc in value_optimized_out(value*) /home/smarchi/src/binutils-gdb/gdb/value.c:1411

previously allocated by thread T0 here:
    #0 0x7f5903c43078 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c078)
    #1 0x4a1c73 in xmalloc /home/smarchi/src/binutils-gdb/gdb/alloc.c:60
    #2 0x1607f59 in call_chunkfun /home/smarchi/src/binutils-gdb/libiberty/obstack.c:94
    #3 0x1608104 in _obstack_begin_worker /home/smarchi/src/binutils-gdb/libiberty/obstack.c:141
    #4 0x16083b7 in _obstack_begin /home/smarchi/src/binutils-gdb/libiberty/obstack.c:164
    #5 0x98af5a in reinit_frame_cache() /home/smarchi/src/binutils-gdb/gdb/frame.c:1865
    #6 0x109348a in switch_to_thread(thread_info*) /home/smarchi/src/binutils-gdb/gdb/thread.c:1316
    #7 0xad0ed4 in inferior_command /home/smarchi/src/binutils-gdb/gdb/inferior.c:636
    #8 0x6524e3 in do_const_cfunc /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95
    #9 0x65ae7c in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2187
    #10 0x10a85fd in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:668
    #11 0x94bb32 in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:588
    #12 0x94c436 in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/smarchi/src/binutils-gdb/gdb/event-top.c:773
    #13 0x94a716 in gdb_rl_callback_handler /home/smarchi/src/binutils-gdb/gdb/event-top.c:219
    #14 0x12afe46 in rl_callback_read_char /home/smarchi/src/binutils-gdb/readline/readline/callback.c:281
    #15 0x94a28a in gdb_rl_callback_read_char_wrapper_noexcept /home/smarchi/src/binutils-gdb/gdb/event-top.c:177
    #16 0x94a488 in gdb_rl_callback_read_char_wrapper /home/smarchi/src/binutils-gdb/gdb/event-top.c:194
    #17 0x94b6e1 in stdin_event_handler(int, void*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:516
    #18 0x15a7416 in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:548
    #19 0x15a7c36 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:673
    #20 0x15a5dbb in gdb_do_one_event() /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:215
    #21 0xbfe62d in start_event_loop /home/smarchi/src/binutils-gdb/gdb/main.c:356
    #22 0xbfe935 in captured_command_loop /home/smarchi/src/binutils-gdb/gdb/main.c:416
    #23 0xc01d39 in captured_main /home/smarchi/src/binutils-gdb/gdb/main.c:1253
    #24 0xc01dc9 in gdb_main(captured_main_args*) /home/smarchi/src/binutils-gdb/gdb/main.c:1268
    #25 0x414ddd in main /home/smarchi/src/binutils-gdb/gdb/gdb.c:32
    #26 0x7f590110b82f in __libc_start_main ../csu/libc-start.c:291

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-06 21:28 ` [PATCH 0/7] " Simon Marchi
@ 2020-07-07  0:25   ` Pedro Alves
  2020-07-07  1:27     ` Pedro Alves
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2020-07-07  0:25 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 20186 bytes --]

On 7/6/20 10:28 PM, Simon Marchi wrote:
> On 2020-07-06 3:02 p.m., Pedro Alves wrote:
>> This patch series fixes PR 26199, a GDB 10 blocker.
>>
>> I discussed how to fix this with Simon, and we came to the conclusion
>> that we can fix it by removing code.  Easy.  :-) That's the last patch
>> in the series.
>>
>> Well, not so easy, actually... :-/
>>
>> Doing that alone caused gdb.multi/multi-target.exp to regress.  And
>> the reason was that the fix for PR 261299 made that testcase trip on a
>> few latent bugs...  One of those bugs also caused a similar 100% cpu
>> busy loop.
>>
>> And then, while fixing those, I added a new test scenario to
>> gdb.multi/multi-target.exp to exercise the TARGET_WAITKIND_NO_RESUMED
>> handling fixes in this series.  That new test requires sending Ctrl-C
>> to GDB after the test is done with, in order to cleanly kill gdbserver
>> via "monitor exit".  But, that Ctrl-C didn't work, due to an issue
>> with GDB's terminal handling, GDB would just hang...
>>
>> That's all fixed by this series.
>>
>> Pedro Alves (6):
>>   Fix spurious unhandled remote %Stop notifications
>>   Fix latent bug in target_pass_ctrlc
>>   Avoid constant stream of TARGET_WAITKIND_NO_RESUMED
>>   Fix handle_no_resumed w/ multiple targets
>>   Make handle_no_resumed transfer terminal
>>   Testcase for previous handle_no_resumed fixes
>>
>> Simon Marchi (1):
>>   Fix GDB busy loop when interrupting non-stop program (PR 26199)
>>
>>  gdb/infrun.c                             | 116 +++++++++++++++++++++----------
>>  gdb/remote.c                             |  15 +++-
>>  gdb/target.c                             |   2 +-
>>  gdb/testsuite/gdb.multi/multi-target.c   |   4 +-
>>  gdb/testsuite/gdb.multi/multi-target.exp |  76 ++++++++++++++++++++
>>  5 files changed, 173 insertions(+), 40 deletions(-)
>>
>>
>> base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
>> -- 
>> 2.14.5
>>
> I am not sure why I didn't see this earlier, but gdb.multi/multi-target.exp
> fails for me with my ASan-enabled build, using current master.  There is
> a use after free.  Do you see the same?  I've attached the ASan log.

I can reproduce it.

This is triggering right after GDB sends "monitor exit" to
tear down gdbserver.  If GDBserver is fast enough to exit and
close its end of the socket, then gdb may end up trying to
send a packet to gdbserver, which makes gdb realize the socket
is closed, close the remote target, and throw an exception.

I'm attaching a patch one can use to reproduce this more
easily.  Once gdb_interact kicks, you can attach to
gdb from another terminal, and then issue "monitor exit"
from the inferior gdb.

> 
> ==18555==ERROR: AddressSanitizer: heap-use-after-free on address 0x621004670aa8 at pc 0x0000007ab125 bp 0x7ffdecaecd20 sp 0x7ffdecaecd10
> READ of size 4 at 0x621004670aa8 thread T0
>     #0 0x7ab124 in dwarf2_frame_this_id /home/smarchi/src/binutils-gdb/gdb/dwarf2/frame.c:1228
>     #1 0x983ec5 in compute_frame_id /home/smarchi/src/binutils-gdb/gdb/frame.c:550
>     #2 0x9841ee in get_frame_id(frame_info*) /home/smarchi/src/binutils-gdb/gdb/frame.c:582
>     #3 0x1093faa in scoped_restore_current_thread::scoped_restore_current_thread() /home/smarchi/src/binutils-gdb/gdb/thread.c:1462
>     #4 0xaee5ba in fetch_inferior_event(void*) /home/smarchi/src/binutils-gdb/gdb/infrun.c:3968
>     #5 0xaa990b in inferior_event_handler(inferior_event_type, void*) /home/smarchi/src/binutils-gdb/gdb/inf-loop.c:43
>     #6 0xea61b6 in remote_async_serial_handler /home/smarchi/src/binutils-gdb/gdb/remote.c:14161
>     #7 0xefca8a in run_async_handler_and_reschedule /home/smarchi/src/binutils-gdb/gdb/ser-base.c:137
>     #8 0xefcd23 in fd_event /home/smarchi/src/binutils-gdb/gdb/ser-base.c:188
>     #9 0x15a7416 in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:548
>     #10 0x15a7c36 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:673
>     #11 0x15a5dbb in gdb_do_one_event() /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:215
>     #12 0xbfe62d in start_event_loop /home/smarchi/src/binutils-gdb/gdb/main.c:356
>     #13 0xbfe935 in captured_command_loop /home/smarchi/src/binutils-gdb/gdb/main.c:416
>     #14 0xc01d39 in captured_main /home/smarchi/src/binutils-gdb/gdb/main.c:1253
>     #15 0xc01dc9 in gdb_main(captured_main_args*) /home/smarchi/src/binutils-gdb/gdb/main.c:1268
>     #16 0x414ddd in main /home/smarchi/src/binutils-gdb/gdb/gdb.c:32
>     #17 0x7f590110b82f in __libc_start_main ../csu/libc-start.c:291
>     #18 0x414bd8 in _start (/home/smarchi/build/binutils-gdb/gdb/gdb+0x414bd8)

So what happens is that above, we're in dwarf2_frame_this_id, just after the dwarf2_frame_cache
call.  So you would think that the "cache" variable would be good.  But it's not, that's
what has been freed.  But why has it been freed?  See below.

> 
> 0x621004670aa8 is located 424 bytes inside of 4064-byte region [0x621004670900,0x6210046718e0)
> freed by thread T0 here:
>     #0 0x7f5903c42c7f in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10bc7f)
>     #1 0x98fb0c in xfree<void> /home/smarchi/src/binutils-gdb/gdb/../gdbsupport/common-utils.h:62
>     #2 0x160804b in call_freefun /home/smarchi/src/binutils-gdb/libiberty/obstack.c:103
>     #3 0x1608a72 in _obstack_free /home/smarchi/src/binutils-gdb/libiberty/obstack.c:280
>     #4 0x98af3b in reinit_frame_cache() /home/smarchi/src/binutils-gdb/gdb/frame.c:1864
>     #5 0x109341c in switch_to_no_thread() /home/smarchi/src/binutils-gdb/gdb/thread.c:1301
>     #6 0xad0cf2 in switch_to_inferior_no_thread(inferior*) /home/smarchi/src/binutils-gdb/gdb/inferior.c:612
>     #7 0xe76c9f in remote_unpush_target /home/smarchi/src/binutils-gdb/gdb/remote.c:5521
>     #8 0xe8d997 in remote_target::readchar(int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9138
>     #9 0xe90082 in remote_target::getpkt_or_notif_sane_1(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int, int, int*) /home/smarchi/src/binutils-gdb/gdb/remote.c:9684
>     #10 0xe90adc in remote_target::getpkt_sane(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9791
>     #11 0xe8fe58 in remote_target::getpkt(std::vector<char, gdb::default_init_allocator<char, std::allocator<char> > >*, int) /home/smarchi/src/binutils-gdb/gdb/remote.c:9624
>     #12 0xe8c4b6 in remote_target::remote_read_bytes_1(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:8861
>     #13 0xe8cd1f in remote_target::remote_read_bytes(unsigned long, unsigned char*, unsigned long, int, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:8988
>     #14 0xe96134 in remote_target::xfer_partial(target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/remote.c:10988
>     #15 0x104a76f in raw_memory_xfer_partial(target_ops*, unsigned char*, unsigned char const*, unsigned long, long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:918
>     #16 0x104b65c in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:1148
>     #17 0x104c4b0 in target_read_partial /home/smarchi/src/binutils-gdb/gdb/target.c:1379
>     #18 0x104c68e in target_read(target_ops*, target_object, char const*, unsigned char*, unsigned long, long) /home/smarchi/src/binutils-gdb/gdb/target.c:1419
>     #19 0x104bd7a in target_read_raw_memory(unsigned long, unsigned char*, long) /home/smarchi/src/binutils-gdb/gdb/target.c:1252
>     #20 0x761b32 in dcache_read_line /home/smarchi/src/binutils-gdb/gdb/dcache.c:336
>     #21 0x761ebf in dcache_peek_byte /home/smarchi/src/binutils-gdb/gdb/dcache.c:403
>     #22 0x76244d in dcache_read_memory_partial(target_ops*, dcache_struct*, unsigned long, unsigned char*, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/dcache.c:484
>     #23 0x104ae00 in memory_xfer_partial_1 /home/smarchi/src/binutils-gdb/gdb/target.c:1033
>     #24 0x104b03d in memory_xfer_partial /home/smarchi/src/binutils-gdb/gdb/target.c:1076
>     #25 0x104b5c7 in target_xfer_partial(target_ops*, target_object, char const*, unsigned char*, unsigned char const*, unsigned long, unsigned long, unsigned long*) /home/smarchi/src/binutils-gdb/gdb/target.c:1133
>     #26 0x11e85c3 in read_value_memory(value*, long, int, unsigned long, unsigned char*, unsigned long) /home/smarchi/src/binutils-gdb/gdb/valops.c:956
>     #27 0x121eb88 in value_fetch_lazy_memory /home/smarchi/src/binutils-gdb/gdb/value.c:3764
>     #28 0x121fa89 in value_fetch_lazy(value*) /home/smarchi/src/binutils-gdb/gdb/value.c:3910
>     #29 0x1214bcc in value_optimized_out(value*) /home/smarchi/src/binutils-gdb/gdb/value.c:1411

It is not immediately obvious because the backtrace is trimmed, but this backtrace that lead
to the free is _within_ the dwarf2_frame_this_id -> dwarf2_frame_cache:

(top-gdb) bt 
#0  reinit_frame_cache () at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/frame.c:1855
#1  0x00000000014ff7b0 in switch_to_no_thread () at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:1301
#2  0x0000000000f66d3e in switch_to_inferior_no_thread (inf=0x615000338180) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inferior.c:626
#3  0x00000000012f3826 in remote_unpush_target (target=0x6170000c5900) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:5521
#4  0x00000000013097e0 in remote_target::readchar (this=0x6170000c5900, timeout=2) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9137
#5  0x000000000130be4d in remote_target::getpkt_or_notif_sane_1 (this=0x6170000c5900, buf=0x6170000c5918, forever=0, expecting_notif=0, is_notif=0x0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9683
#6  0x000000000130c8ab in remote_target::getpkt_sane (this=0x6170000c5900, buf=0x6170000c5918, forever=0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9790
#7  0x000000000130bc0d in remote_target::getpkt (this=0x6170000c5900, buf=0x6170000c5918, forever=0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9623
#8  0x000000000130838e in remote_target::remote_read_bytes_1 (this=0x6170000c5900, memaddr=0x7fffffffcdc0, myaddr=0x6080000ad3bc "", len_units=64, unit_size=1, xfered_len_units=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:8860
#9  0x0000000001308bd2 in remote_target::remote_read_bytes (this=0x6170000c5900, memaddr=0x7fffffffcdc0, myaddr=0x6080000ad3bc "", len=64, unit_size=1, xfered_len=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:8987
#10 0x0000000001311ed1 in remote_target::xfer_partial (this=0x6170000c5900, object=TARGET_OBJECT_MEMORY, annex=0x0, readbuf=0x6080000ad3bc "", writebuf=0x0, offset=140737488342464, len=64, xfered_len=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:10988
#11 0x00000000014ba969 in raw_memory_xfer_partial (ops=0x6170000c5900, readbuf=0x6080000ad3bc "", writebuf=0x0, memaddr=140737488342464, len=64, xfered_len=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:918
#12 0x00000000014bb720 in target_xfer_partial (ops=0x6170000c5900, object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, readbuf=0x6080000ad3bc "", writebuf=0x0, offset=140737488342464, len=64, xfered_len=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1148
#13 0x00000000014bc3b5 in target_read_partial (ops=0x6170000c5900, object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0x6080000ad3bc "", offset=140737488342464, len=64, xfered_len=0x7fff6a29b9a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1380
#14 0x00000000014bc593 in target_read (ops=0x6170000c5900, object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0x6080000ad3bc "", offset=140737488342464, len=64) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1419
#15 0x00000000014bbd4d in target_read_raw_memory (memaddr=0x7fffffffcdc0, myaddr=0x6080000ad3bc "", len=64) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1252
#16 0x0000000000bf27df in dcache_read_line (dcache=0x6060001eddc0, db=0x6080000ad3a0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dcache.c:336
#17 0x0000000000bf2b72 in dcache_peek_byte (dcache=0x6060001eddc0, addr=0x7fffffffcdd8, ptr=0x6020001231b0 "") at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dcache.c:403
#18 0x0000000000bf3103 in dcache_read_memory_partial (ops=0x6170000c5900, dcache=0x6060001eddc0, memaddr=0x7fffffffcdd8, myaddr=0x6020001231b0 "", len=8, xfered_len=0x7fff6a29bf20) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dcache.c:484
#19 0x00000000014bafe9 in memory_xfer_partial_1 (ops=0x6170000c5900, object=TARGET_OBJECT_STACK_MEMORY, readbuf=0x6020001231b0 "", writebuf=0x0, memaddr=140737488342488, len=8, xfered_len=0x7fff6a29bf20) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1034
#20 0x00000000014bb212 in memory_xfer_partial (ops=0x6170000c5900, object=TARGET_OBJECT_STACK_MEMORY, readbuf=0x6020001231b0 "", writebuf=0x0, memaddr=140737488342488, len=8, xfered_len=0x7fff6a29bf20) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1076
#21 0x00000000014bb6b3 in target_xfer_partial (ops=0x6170000c5900, object=TARGET_OBJECT_STACK_MEMORY, annex=0x0, readbuf=0x6020001231b0 "", writebuf=0x0, offset=140737488342488, len=8, xfered_len=0x7fff6a29bf20) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:1133
#22 0x000000000164564d in read_value_memory (val=0x60f000029440, bit_offset=0, stack=1, memaddr=0x7fffffffcdd8, buffer=0x6020001231b0 "", length=8) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/valops.c:956
#23 0x0000000001680fff in value_fetch_lazy_memory (val=0x60f000029440) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/value.c:3764
#24 0x0000000001681efd in value_fetch_lazy (val=0x60f000029440) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/value.c:3910
#25 0x0000000001676143 in value_optimized_out (value=0x60f000029440) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/value.c:1411
#26 0x0000000000e0fcb8 in frame_register_unwind (next_frame=0x6210066bfde0, regnum=16, optimizedp=0x7fff6a29c200, unavailablep=0x7fff6a29c240, lvalp=0x7fff6a29c2c0, addrp=0x7fff6a29c300, realnump=0x7fff6a29c280, bufferp=0x7fff6a29c3a0 "@\304)j\377\177") at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/frame.c:1144
#27 0x0000000000e10418 in frame_unwind_register (next_frame=0x6210066bfde0, regnum=16, buf=0x7fff6a29c3a0 "@\304)j\377\177") at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/frame.c:1196
#28 0x0000000000f00431 in i386_unwind_pc (gdbarch=0x6210043d0110, next_frame=0x6210066bfde0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/i386-tdep.c:1969
#29 0x0000000000e39724 in gdbarch_unwind_pc (gdbarch=0x6210043d0110, next_frame=0x6210066bfde0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/gdbarch.c:3056
#30 0x0000000000c2ea90 in dwarf2_tailcall_sniffer_first (this_frame=0x6210066bfde0, tailcall_cachep=0x6210066bfee0, entry_cfa_sp_offsetp=0x0) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dwarf2/frame-tailcall.c:423
#31 0x0000000000c36bdb in dwarf2_frame_cache (this_frame=0x6210066bfde0, this_cache=0x6210066bfdf8) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dwarf2/frame.c:1198
#32 0x0000000000c36eb3 in dwarf2_frame_this_id (this_frame=0x6210066bfde0, this_cache=0x6210066bfdf8, this_id=0x6210066bfe40) at /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/dwarf2/frame.c:1226

Note that remote_target::readchar in frame #3 throws TARGET_CLOSE_ERROR after
that remote_unpush_target in frame #3 returns.

The problem is that that TARGET_CLOSE_ERROR is swallowed by value_optimized_out
in frame #25.

If we fix that one, then we run into dwarf2_tailcall_sniffer_first swallowing
the exception in frame #30 too.

The attached patch fixes that issue for me, but I've not
run the whole testsuite with it yet.  I'm not 100% sure that that
makes us catch all the important exceptions.  I've thought before
about making TARGET_CLOSE_ERROR be more like a QUIT exception, which
is not caught by catch gdb_exception_error, but I've resisted exactly
because it tends to be wrong to swallow too much, I think.

However, with the fix, the testcase now runs into another Asan-reported issue:

=================================================================
==9211==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000b1080 at pc 0x000000dd0344 bp 0x7ffe4bebca90 sp 0x7ffe4bebca80
READ of size 4 at 0x6160000b1080 thread T0
    #0 0xdd0343 in refcounted_object::incref() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/../gdbsupport/refcounted-object.h:34
    #1 0x150066f in scoped_restore_current_thread::scoped_restore_current_thread() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:1471
    #2 0xf83564 in fetch_inferior_event() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/infrun.c:3952
    #3 0xf40736 in inferior_event_handler(inferior_event_type) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inf-loop.c:42
    #4 0x1321e75 in remote_async_serial_handler /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:14160
    #5 0x1371849 in run_async_handler_and_reschedule /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/ser-base.c:137
    #6 0x1371ae2 in fd_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/ser-base.c:188
    #7 0x19fc1f7 in handle_file_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:548
    #8 0x19fca14 in gdb_wait_for_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:673
    #9 0x19fab94 in gdb_do_one_event() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:215
    #10 0x1087704 in start_event_loop /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:356
    #11 0x10879f5 in captured_command_loop /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:416
    #12 0x108aef1 in captured_main /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:1253
    #13 0x108af81 in gdb_main(captured_main_args*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:1268
    #14 0x8af9fa in main /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/gdb.c:32
    #15 0x7f07d5efbfe9 in __libc_start_main (/lib64/libc.so.6+0x20fe9)
    #16 0x8af809 in _start (/home/pedro/brno/pedro/gdb/binutils-gdb-2/build-asan/gdb/gdb+0x8af809)

0x6160000b1080 is located 0 bytes inside of 592-byte region [0x6160000b1080,0x6160000b12d0)
freed by thread T0 here:
    #0 0x7f07d97966d8 in operator delete(void*, unsigned long) (/lib64/libasan.so.4+0xe16d8)
    #1 0x14f7147 in delete_thread_1 /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:452
    #2 0x14f7171 in delete_thread(thread_info*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:460
    #3 0xf6278d in exit_inferior_1 /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inferior.c:204
    #4 0xf62b5b in exit_inferior(inferior*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inferior.c:236
    #5 0x14c32a6 in generic_mourn_inferior() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:3119
    #6 0x12f388a in remote_unpush_target /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:5523
    #7 0x1309835 in remote_target::readchar(int) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9137


Odd, kind of looks like we're mishandling the thread_info refcounts.

[-- Attachment #2: 0001-Crash-reproducer.patch --]
[-- Type: text/x-patch, Size: 1904 bytes --]

From 70b049caa3cf74f5ada6eeaa494caca09247479c Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 6 Jul 2020 23:07:40 +0100
Subject: [PATCH] Crash reproducer

---
 gdb/remote.c                             | 2 ++
 gdb/testsuite/gdb.multi/multi-target.exp | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gdb/remote.c b/gdb/remote.c
index 59075cb09f..79f5430bb0 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -11281,6 +11281,8 @@ remote_target::rcmd (const char *command, struct ui_file *outbuf)
   if (putpkt (rs->buf) < 0)
     error (_("Communication problem with target."));
 
+  error ("boo");
+
   /* get/display the response */
   while (1)
     {
diff --git a/gdb/testsuite/gdb.multi/multi-target.exp b/gdb/testsuite/gdb.multi/multi-target.exp
index d19cee6595..d1d7b9a57e 100644
--- a/gdb/testsuite/gdb.multi/multi-target.exp
+++ b/gdb/testsuite/gdb.multi/multi-target.exp
@@ -111,6 +111,7 @@ proc cleanup_gdbservers { } {
     foreach { inferior_id spawn_id } $server_spawn_ids {
 	set server_spawn_id $spawn_id
 	gdb_test "inferior $inferior_id"
+	gdb_interact
 	gdbserver_exit 0
     }
     set server_spawn_ids [list]
@@ -520,6 +521,7 @@ with_test_prefix "continue" {
 	test_continue ${non-stop}
     }
 }
+return
 
 # Some basic all-stop Ctrl-C tests.
 with_test_prefix "interrupt" {

base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
prerequisite-patch-id: 32ffdda7d7d774bc4df88bf848bcb796559b53ce
prerequisite-patch-id: 02021b74355b70debd344a6e445285c67dfef7d6
prerequisite-patch-id: c87fcf5a54f6805967cbf8ab107606c57d9ecf52
prerequisite-patch-id: ac7dee583d0ffa519c9d1cd89d27664bca68d8c1
prerequisite-patch-id: eac59ae2ea85d2d51e5be1b03e88a5641cc12c22
prerequisite-patch-id: 13da42ad04dc8e2e3bd6a556a0be0e17cf23669b
prerequisite-patch-id: fd3f09fdb58ddc1c595ea014716851f4c8fca48c
prerequisite-patch-id: 55b398673bd7edefb85d383c82785b668588e9c2
-- 
2.14.5


[-- Attachment #3: 0001-Fix-crash-part-1.patch --]
[-- Type: text/x-patch, Size: 2031 bytes --]

From fa0b349ab8fec78e3296fdd8042001cc54511032 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 7 Jul 2020 01:19:55 +0100
Subject: [PATCH] Fix crash, part 1

---
 gdb/dwarf2/frame-tailcall.c | 6 ++++--
 gdb/value.c                 | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c
index 16dba2b201..cb9fd4a506 100644
--- a/gdb/dwarf2/frame-tailcall.c
+++ b/gdb/dwarf2/frame-tailcall.c
@@ -377,7 +377,7 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,
      get_frame_address_in_block will decrease it by 1 in such case.  */
   this_pc = get_frame_address_in_block (this_frame);
 
-  /* Catch any unwinding errors.  */
+  /* Catch NO_ENTRY_VALUE_ERROR thrown by call_site_find_chain.  */
   try
     {
       int sp_regnum;
@@ -439,7 +439,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,
     {
       if (entry_values_debug)
 	exception_print (gdb_stdout, except);
-      return;
+      if (except.error == NO_ENTRY_VALUE_ERROR)
+	return;
+      throw;
     }
 
   /* Ambiguous unwind or unambiguous unwind verified as matching.  */
diff --git a/gdb/value.c b/gdb/value.c
index 97a099ddbd..e22ef96c4c 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -1413,6 +1413,8 @@ value_optimized_out (struct value *value)
       catch (const gdb_exception_error &ex)
 	{
 	  /* Fall back to checking value->optimized_out.  */
+	  if (ex.error != OPTIMIZED_OUT_ERROR)
+	    throw;
 	}
     }
 

base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
prerequisite-patch-id: 32ffdda7d7d774bc4df88bf848bcb796559b53ce
prerequisite-patch-id: 02021b74355b70debd344a6e445285c67dfef7d6
prerequisite-patch-id: c87fcf5a54f6805967cbf8ab107606c57d9ecf52
prerequisite-patch-id: ac7dee583d0ffa519c9d1cd89d27664bca68d8c1
prerequisite-patch-id: eac59ae2ea85d2d51e5be1b03e88a5641cc12c22
prerequisite-patch-id: 13da42ad04dc8e2e3bd6a556a0be0e17cf23669b
prerequisite-patch-id: fd3f09fdb58ddc1c595ea014716851f4c8fca48c
-- 
2.14.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-07  0:25   ` Pedro Alves
@ 2020-07-07  1:27     ` Pedro Alves
  2020-07-07  1:29       ` Pedro Alves
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2020-07-07  1:27 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 4118 bytes --]

On 7/7/20 1:25 AM, Pedro Alves wrote:

> However, with the fix, the testcase now runs into another Asan-reported issue:
> 
> =================================================================
> ==9211==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000b1080 at pc 0x000000dd0344 bp 0x7ffe4bebca90 sp 0x7ffe4bebca80
> READ of size 4 at 0x6160000b1080 thread T0
>     #0 0xdd0343 in refcounted_object::incref() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/../gdbsupport/refcounted-object.h:34
>     #1 0x150066f in scoped_restore_current_thread::scoped_restore_current_thread() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:1471
>     #2 0xf83564 in fetch_inferior_event() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/infrun.c:3952
>     #3 0xf40736 in inferior_event_handler(inferior_event_type) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inf-loop.c:42
>     #4 0x1321e75 in remote_async_serial_handler /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:14160
>     #5 0x1371849 in run_async_handler_and_reschedule /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/ser-base.c:137
>     #6 0x1371ae2 in fd_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/ser-base.c:188
>     #7 0x19fc1f7 in handle_file_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:548
>     #8 0x19fca14 in gdb_wait_for_event /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:673
>     #9 0x19fab94 in gdb_do_one_event() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdbsupport/event-loop.cc:215
>     #10 0x1087704 in start_event_loop /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:356
>     #11 0x10879f5 in captured_command_loop /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:416
>     #12 0x108aef1 in captured_main /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:1253
>     #13 0x108af81 in gdb_main(captured_main_args*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/main.c:1268
>     #14 0x8af9fa in main /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/gdb.c:32
>     #15 0x7f07d5efbfe9 in __libc_start_main (/lib64/libc.so.6+0x20fe9)
>     #16 0x8af809 in _start (/home/pedro/brno/pedro/gdb/binutils-gdb-2/build-asan/gdb/gdb+0x8af809)
> 
> 0x6160000b1080 is located 0 bytes inside of 592-byte region [0x6160000b1080,0x6160000b12d0)
> freed by thread T0 here:
>     #0 0x7f07d97966d8 in operator delete(void*, unsigned long) (/lib64/libasan.so.4+0xe16d8)
>     #1 0x14f7147 in delete_thread_1 /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:452
>     #2 0x14f7171 in delete_thread(thread_info*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/thread.c:460
>     #3 0xf6278d in exit_inferior_1 /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inferior.c:204
>     #4 0xf62b5b in exit_inferior(inferior*) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/inferior.c:236
>     #5 0x14c32a6 in generic_mourn_inferior() /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/target.c:3119
>     #6 0x12f388a in remote_unpush_target /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:5523
>     #7 0x1309835 in remote_target::readchar(int) /home/pedro/brno/pedro/gdb/binutils-gdb-2/build/../src/gdb/remote.c:9137
> 
> 
> Odd, kind of looks like we're mishandling the thread_info refcounts.
> 

This fixes it.  multi-target.exp now passes Asan-clean with this one on top
of "part 1" patch.  I still haven't run the full testsuite.

The issue is that the remote target is unpushed while within scoped_restore_current_thread' dtor's
get_frame_id call, which results in threads being deleted.  However, back in
scoped_restore_current_thread's ctor, we only increment the refcount after get_frame_id
returns.  Incrementing the refcounts earlier fixes it.

However, we should probably also propagate the TARGET_CLOSE_ERROR in this case.
That alone would fix it, though it seems cleaner to do both tweaks.

[-- Attachment #2: 0001-Fix-crash-part-2.patch --]
[-- Type: text/x-patch, Size: 1982 bytes --]

From 10203ffc8c57d92568b8e84b75389df25f3c4a58 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 7 Jul 2020 01:50:10 +0100
Subject: [PATCH] Fix crash, part 2

---
 gdb/thread.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gdb/thread.c b/gdb/thread.c
index f0722d3588..1ec047e35b 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1433,15 +1433,17 @@ scoped_restore_current_thread::~scoped_restore_current_thread ()
 
 scoped_restore_current_thread::scoped_restore_current_thread ()
 {
-  m_thread = NULL;
   m_inf = current_inferior ();
+  m_inf->incref ();
 
   if (inferior_ptid != null_ptid)
     {
-      thread_info *tp = inferior_thread ();
+      m_thread = inferior_thread ();
+      m_thread->incref ();
+
       struct frame_info *frame;
 
-      m_was_stopped = tp->state == THREAD_STOPPED;
+      m_was_stopped = m_thread->state == THREAD_STOPPED;
       if (m_was_stopped
 	  && target_has_registers
 	  && target_has_stack
@@ -1466,13 +1468,14 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
 	{
 	  m_selected_frame_id = null_frame_id;
 	  m_selected_frame_level = -1;
-	}
 
-      tp->incref ();
-      m_thread = tp;
+	  /* Better let this propagate.  */
+	  if (ex.error == TARGET_CLOSE_ERROR)
+	    throw;
+	}
     }
-
-  m_inf->incref ();
+  else
+    m_thread = NULL;
 }
 
 /* See gdbthread.h.  */

base-commit: ad8464f799a4c96c7ab8bdfec3f95846cf54f9b0
prerequisite-patch-id: 32ffdda7d7d774bc4df88bf848bcb796559b53ce
prerequisite-patch-id: 02021b74355b70debd344a6e445285c67dfef7d6
prerequisite-patch-id: c87fcf5a54f6805967cbf8ab107606c57d9ecf52
prerequisite-patch-id: ac7dee583d0ffa519c9d1cd89d27664bca68d8c1
prerequisite-patch-id: eac59ae2ea85d2d51e5be1b03e88a5641cc12c22
prerequisite-patch-id: 13da42ad04dc8e2e3bd6a556a0be0e17cf23669b
prerequisite-patch-id: fd3f09fdb58ddc1c595ea014716851f4c8fca48c
prerequisite-patch-id: 55b398673bd7edefb85d383c82785b668588e9c2
-- 
2.14.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-07  1:27     ` Pedro Alves
@ 2020-07-07  1:29       ` Pedro Alves
  0 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-07  1:29 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 7/7/20 2:27 AM, Pedro Alves wrote:
> The issue is that the remote target is unpushed while within scoped_restore_current_thread' dtor's

I meant ctor, not dtor.

> get_frame_id call, which results in threads being deleted.  However, back in
> scoped_restore_current_thread's ctor, we only increment the refcount after get_frame_id
> returns.  Incrementing the refcounts earlier fixes it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199)
  2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
                   ` (7 preceding siblings ...)
  2020-07-06 21:28 ` [PATCH 0/7] " Simon Marchi
@ 2020-07-10 23:02 ` Pedro Alves
  8 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2020-07-10 23:02 UTC (permalink / raw)
  To: gdb-patches

On 7/6/20 8:02 PM, Pedro Alves wrote:
> This patch series fixes PR 26199, a GDB 10 blocker.
> 

I've merged this series now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] Fix spurious unhandled remote %Stop notifications
  2020-07-06 19:02 ` [PATCH 1/7] Fix spurious unhandled remote %Stop notifications Pedro Alves
@ 2020-12-12 22:13   ` Andrew Burgess
  2020-12-13  0:46     ` Simon Marchi
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Burgess @ 2020-12-12 22:13 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

* Pedro Alves <pedro@palves.net> [2020-07-06 20:02:46 +0100]:

> In non-stop mode, remote targets mark an async event source whose
> callback is supposed to result in calling remote_target::wait_ns to
> either process the event queue, or acknowledge an incoming %Stop
> notification.
> 
> The callback in question is remote_async_inferior_event_handler, where
> we call inferior_event_handler, to end up in fetch_inferior_event ->
> target_wait -> remote_target::wait -> remote_target::wait_ns.
> 
> A problem here however is that when debugging multiple targets,
> fetch_inferior_event can pull events out of any target picked at
> random, for event fairness.  This means that when
> remote_async_inferior_event_handler returns, remote_target::wait may
> have not been called at all, and thus pending notifications may have
> not been acked.  Because async event sources auto-clear, when
> remote_async_inferior_event_handler returns the async event handler is
> no longer marked, so the event loop won't automatically call
> remote_async_inferior_event_handler again to try to process the
> pending remote notifications/queue.  The result is that stop events
> may end up not processed, e.g., "interrupt -a" seemingly not managing
> to stop all threads.
> 
> Fix this by making remote_async_inferior_event_handler mark the event
> handler again before returning, if necessary.
> 
> Maybe a better fix would be to make async event handlers not
> auto-clear themselves, make that the responsibility of the callback,
> so that the event loop would keep calling the callback automatically.
> Or, we could try making so that fetch_inferior_event would optionally
> handle events only for the target that it got passed down via
> parameter.  However, I don't think now just before branching is the
> time to try to do any such change.
> 
> gdb/ChangeLog:
> 
> 	PR gdb/26199
> 	* remote.c (remote_target::open_1): Pass remote target pointer as
> 	data to create_async_event_handler.
> 	(remote_async_inferior_event_handler): Mark async event handler
> 	before returning if the remote target still has either pending
> 	events or unacknowledged notifications.
> ---
>  gdb/remote.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index f7f99dc24f..59075cb09f 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -5605,7 +5605,7 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)
>  
>    /* Register extra event sources in the event loop.  */
>    rs->remote_async_inferior_event_token
> -    = create_async_event_handler (remote_async_inferior_event_handler, NULL);
> +    = create_async_event_handler (remote_async_inferior_event_handler, remote);
>    rs->notif_state = remote_notif_state_allocate (remote);
>  
>    /* Reset the target state; these things will be queried either by
> @@ -14164,6 +14164,19 @@ static void
>  remote_async_inferior_event_handler (gdb_client_data data)
>  {
>    inferior_event_handler (INF_REG_EVENT);
> +
> +  remote_target *remote = (remote_target *) data;
> +  remote_state *rs = remote->get_remote_state ();
> +
> +  /* inferior_event_handler may have consumed an event pending on the
> +     infrun side without calling target_wait on the REMOTE target, or
> +     may have pulled an event out of a different target.  Keep trying
> +     for this remote target as long it still has either pending events
> +     or unacknowledged notifications.  */
> +
> +  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
> +      || !rs->stop_reply_queue.empty ())
> +    mark_async_event_handler (rs->remote_async_inferior_event_token);
>  }

Pedro,

This patch introduced a use after free issue here.  This can be seen
by running the test:

  make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver gdb.base/inferior-died.exp"

For me this fails maybe 1 in 5 times.  I've done some initial
investigation at the problem is obvious one you see the following
stack trace:

  #0  remote_state::~remote_state (this=0x338d548, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:1097
  #1  0x0000000000acf3b3 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4078
  #2  0x0000000000acf3f6 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4097
  #3  0x0000000000acf2fa in remote_target::close (this=0x338d530) at ../../src.dev-3/gdb/remote.c:4075
  #4  0x0000000000c75bfd in target_close (targ=0x338d530) at ../../src.dev-3/gdb/target.c:3126
  #5  0x0000000000c62ca4 in decref_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:545
  #6  0x0000000000c62ec7 in target_stack::unpush (this=0x3666d50, t=0x338d530) at ../../src.dev-3/gdb/target.c:633
  #7  0x0000000000c7796c in inferior::unpush_target (this=0x3666ba0, t=0x338d530) at ../../src.dev-3/gdb/inferior.h:357
  #8  0x0000000000c62de1 in unpush_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:595
  #9  0x0000000000c62ee7 in unpush_target_and_assert (target=0x338d530) at ../../src.dev-3/gdb/target.c:643
  #10 0x0000000000c62fb6 in pop_all_targets_at_and_above (stratum=process_stratum) at ../../src.dev-3/gdb/target.c:666
  #11 0x0000000000ad21a4 in remote_unpush_target (target=0x338d530) at ../../src.dev-3/gdb/remote.c:5524
  #12 0x0000000000adc619 in remote_target::mourn_inferior (this=0x338d530) at ../../src.dev-3/gdb/remote.c:9962
  #13 0x0000000000c65e79 in target_mourn_inferior (ptid=...) at ../../src.dev-3/gdb/target.c:2136
  #14 0x000000000086d0a5 in handle_inferior_event (ecs=0x7fffffffb3b0) at ../../src.dev-3/gdb/infrun.c:5234
  #15 0x0000000000869beb in fetch_inferior_event () at ../../src.dev-3/gdb/infrun.c:3863
  #16 0x000000000084a922 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src.dev-3/gdb/inf-loop.c:42
  #17 0x0000000000ae73a9 in remote_async_inferior_event_handler (data=0x338d530) at ../../src.dev-3/gdb/remote.c:14177
  #18 0x00000000004ea759 in check_async_event_handlers () at ../../src.dev-3/gdb/async-event.c:328
  #19 0x0000000001449e7a in gdb_do_one_event () at ../../src.dev-3/gdbsupport/event-loop.cc:216
  #20 0x00000000009102d0 in start_event_loop () at ../../src.dev-3/gdb/main.c:347
  #21 0x00000000009103f0 in captured_command_loop () at ../../src.dev-3/gdb/main.c:407
  #22 0x0000000000911be6 in captured_main (data=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1239
  #23 0x0000000000911c4c in gdb_main (args=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1254
  #24 0x000000000041755d in main (argc=5, argv=0x7fffffffb748) at ../../src.dev-3/gdb/gdb.c:32

The inferior event being processed is the inferior exited event, this
is the last remote inferior, and so the remote target is unpushed.
GDB then returns to remote_async_inferior_event_handler where we hit
the code you added above which proceeds to make use of the remote
target :-/

Like I say, the problem is now obvious, but the solution less so!

Reading what you originally wrote in the patch I wondered about the
idea of having it be the call back that is responsible for marking the
async event handler as clear.

I haven't tried to fix this yet, but thought I'd share my findings so
far with you.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] Fix spurious unhandled remote %Stop notifications
  2020-12-12 22:13   ` Andrew Burgess
@ 2020-12-13  0:46     ` Simon Marchi
  0 siblings, 0 replies; 15+ messages in thread
From: Simon Marchi @ 2020-12-13  0:46 UTC (permalink / raw)
  To: Andrew Burgess, Pedro Alves; +Cc: gdb-patches



On 2020-12-12 5:13 p.m., Andrew Burgess wrote:
> * Pedro Alves <pedro@palves.net> [2020-07-06 20:02:46 +0100]:
> 
>> In non-stop mode, remote targets mark an async event source whose
>> callback is supposed to result in calling remote_target::wait_ns to
>> either process the event queue, or acknowledge an incoming %Stop
>> notification.
>>
>> The callback in question is remote_async_inferior_event_handler, where
>> we call inferior_event_handler, to end up in fetch_inferior_event ->
>> target_wait -> remote_target::wait -> remote_target::wait_ns.
>>
>> A problem here however is that when debugging multiple targets,
>> fetch_inferior_event can pull events out of any target picked at
>> random, for event fairness.  This means that when
>> remote_async_inferior_event_handler returns, remote_target::wait may
>> have not been called at all, and thus pending notifications may have
>> not been acked.  Because async event sources auto-clear, when
>> remote_async_inferior_event_handler returns the async event handler is
>> no longer marked, so the event loop won't automatically call
>> remote_async_inferior_event_handler again to try to process the
>> pending remote notifications/queue.  The result is that stop events
>> may end up not processed, e.g., "interrupt -a" seemingly not managing
>> to stop all threads.
>>
>> Fix this by making remote_async_inferior_event_handler mark the event
>> handler again before returning, if necessary.
>>
>> Maybe a better fix would be to make async event handlers not
>> auto-clear themselves, make that the responsibility of the callback,
>> so that the event loop would keep calling the callback automatically.
>> Or, we could try making so that fetch_inferior_event would optionally
>> handle events only for the target that it got passed down via
>> parameter.  However, I don't think now just before branching is the
>> time to try to do any such change.
>>
>> gdb/ChangeLog:
>>
>> 	PR gdb/26199
>> 	* remote.c (remote_target::open_1): Pass remote target pointer as
>> 	data to create_async_event_handler.
>> 	(remote_async_inferior_event_handler): Mark async event handler
>> 	before returning if the remote target still has either pending
>> 	events or unacknowledged notifications.
>> ---
>>  gdb/remote.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/gdb/remote.c b/gdb/remote.c
>> index f7f99dc24f..59075cb09f 100644
>> --- a/gdb/remote.c
>> +++ b/gdb/remote.c
>> @@ -5605,7 +5605,7 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)
>>  
>>    /* Register extra event sources in the event loop.  */
>>    rs->remote_async_inferior_event_token
>> -    = create_async_event_handler (remote_async_inferior_event_handler, NULL);
>> +    = create_async_event_handler (remote_async_inferior_event_handler, remote);
>>    rs->notif_state = remote_notif_state_allocate (remote);
>>  
>>    /* Reset the target state; these things will be queried either by
>> @@ -14164,6 +14164,19 @@ static void
>>  remote_async_inferior_event_handler (gdb_client_data data)
>>  {
>>    inferior_event_handler (INF_REG_EVENT);
>> +
>> +  remote_target *remote = (remote_target *) data;
>> +  remote_state *rs = remote->get_remote_state ();
>> +
>> +  /* inferior_event_handler may have consumed an event pending on the
>> +     infrun side without calling target_wait on the REMOTE target, or
>> +     may have pulled an event out of a different target.  Keep trying
>> +     for this remote target as long it still has either pending events
>> +     or unacknowledged notifications.  */
>> +
>> +  if (rs->notif_state->pending_event[notif_client_stop.id] != NULL
>> +      || !rs->stop_reply_queue.empty ())
>> +    mark_async_event_handler (rs->remote_async_inferior_event_token);
>>  }
> 
> Pedro,
> 
> This patch introduced a use after free issue here.  This can be seen
> by running the test:
> 
>   make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver gdb.base/inferior-died.exp"
> 
> For me this fails maybe 1 in 5 times.  I've done some initial
> investigation at the problem is obvious one you see the following
> stack trace:
> 
>   #0  remote_state::~remote_state (this=0x338d548, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:1097
>   #1  0x0000000000acf3b3 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4078
>   #2  0x0000000000acf3f6 in remote_target::~remote_target (this=0x338d530, __in_chrg=<optimized out>) at ../../src.dev-3/gdb/remote.c:4097
>   #3  0x0000000000acf2fa in remote_target::close (this=0x338d530) at ../../src.dev-3/gdb/remote.c:4075
>   #4  0x0000000000c75bfd in target_close (targ=0x338d530) at ../../src.dev-3/gdb/target.c:3126
>   #5  0x0000000000c62ca4 in decref_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:545
>   #6  0x0000000000c62ec7 in target_stack::unpush (this=0x3666d50, t=0x338d530) at ../../src.dev-3/gdb/target.c:633
>   #7  0x0000000000c7796c in inferior::unpush_target (this=0x3666ba0, t=0x338d530) at ../../src.dev-3/gdb/inferior.h:357
>   #8  0x0000000000c62de1 in unpush_target (t=0x338d530) at ../../src.dev-3/gdb/target.c:595
>   #9  0x0000000000c62ee7 in unpush_target_and_assert (target=0x338d530) at ../../src.dev-3/gdb/target.c:643
>   #10 0x0000000000c62fb6 in pop_all_targets_at_and_above (stratum=process_stratum) at ../../src.dev-3/gdb/target.c:666
>   #11 0x0000000000ad21a4 in remote_unpush_target (target=0x338d530) at ../../src.dev-3/gdb/remote.c:5524
>   #12 0x0000000000adc619 in remote_target::mourn_inferior (this=0x338d530) at ../../src.dev-3/gdb/remote.c:9962
>   #13 0x0000000000c65e79 in target_mourn_inferior (ptid=...) at ../../src.dev-3/gdb/target.c:2136
>   #14 0x000000000086d0a5 in handle_inferior_event (ecs=0x7fffffffb3b0) at ../../src.dev-3/gdb/infrun.c:5234
>   #15 0x0000000000869beb in fetch_inferior_event () at ../../src.dev-3/gdb/infrun.c:3863
>   #16 0x000000000084a922 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src.dev-3/gdb/inf-loop.c:42
>   #17 0x0000000000ae73a9 in remote_async_inferior_event_handler (data=0x338d530) at ../../src.dev-3/gdb/remote.c:14177
>   #18 0x00000000004ea759 in check_async_event_handlers () at ../../src.dev-3/gdb/async-event.c:328
>   #19 0x0000000001449e7a in gdb_do_one_event () at ../../src.dev-3/gdbsupport/event-loop.cc:216
>   #20 0x00000000009102d0 in start_event_loop () at ../../src.dev-3/gdb/main.c:347
>   #21 0x00000000009103f0 in captured_command_loop () at ../../src.dev-3/gdb/main.c:407
>   #22 0x0000000000911be6 in captured_main (data=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1239
>   #23 0x0000000000911c4c in gdb_main (args=0x7fffffffb640) at ../../src.dev-3/gdb/main.c:1254
>   #24 0x000000000041755d in main (argc=5, argv=0x7fffffffb748) at ../../src.dev-3/gdb/gdb.c:32
> 
> The inferior event being processed is the inferior exited event, this
> is the last remote inferior, and so the remote target is unpushed.
> GDB then returns to remote_async_inferior_event_handler where we hit
> the code you added above which proceeds to make use of the remote
> target :-/
> 
> Like I say, the problem is now obvious, but the solution less so!
> 
> Reading what you originally wrote in the patch I wondered about the
> idea of having it be the call back that is responsible for marking the
> async event handler as clear.
> 
> I haven't tried to fix this yet, but thought I'd share my findings so
> far with you.
> 
> Thanks,
> Andrew
> 

This patch series I proposed earlier should fix it:

https://sourceware.org/pipermail/gdb-patches/2020-November/173633.html

Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-12-13  0:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 19:02 [PATCH 0/7] GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
2020-07-06 19:02 ` [PATCH 1/7] Fix spurious unhandled remote %Stop notifications Pedro Alves
2020-12-12 22:13   ` Andrew Burgess
2020-12-13  0:46     ` Simon Marchi
2020-07-06 19:02 ` [PATCH 2/7] Fix latent bug in target_pass_ctrlc Pedro Alves
2020-07-06 19:02 ` [PATCH 3/7] Avoid constant stream of TARGET_WAITKIND_NO_RESUMED Pedro Alves
2020-07-06 19:02 ` [PATCH 4/7] Fix handle_no_resumed w/ multiple targets Pedro Alves
2020-07-06 19:02 ` [PATCH 5/7] Make handle_no_resumed transfer terminal Pedro Alves
2020-07-06 19:02 ` [PATCH 6/7] Testcase for previous handle_no_resumed fixes Pedro Alves
2020-07-06 19:02 ` [PATCH 7/7] Fix GDB busy loop when interrupting non-stop program (PR 26199) Pedro Alves
2020-07-06 21:28 ` [PATCH 0/7] " Simon Marchi
2020-07-07  0:25   ` Pedro Alves
2020-07-07  1:27     ` Pedro Alves
2020-07-07  1:29       ` Pedro Alves
2020-07-10 23:02 ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).