public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: gdb-patches@sourceware.org
Subject: [PATCH 21/31] stop_all_threads: (re-)enable async before waiting for stops
Date: Mon, 12 Dec 2022 20:30:51 +0000	[thread overview]
Message-ID: <20221212203101.1034916-22-pedro@palves.net> (raw)
In-Reply-To: <20221212203101.1034916-1-pedro@palves.net>

Running the
gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase
added later in the series against gdbserver, after the
TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run
into an infinite loop in stop_all_threads, leading to a timeout:

  FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout)

The is really a latent bug, and it is about the fact that
stop_all_threads stops listening to events from a target as soon as it
sees a TARGET_WAITKIND_NO_RESUMED, ignoring that
TARGET_WAITKIND_NO_RESUMED may be delayed.  handle_no_resumed knows
how to handle delayed no-resumed events, but stop_all_threads was
never taught to.

In more detail, here's what happens with that testcase:

#1 - Multiple threads report breakpoint hits to gdb.

#2 - gdb picks one events, and it's for thread 1.  All other stops are
     left pending.  thread 1 needs to move past a breakpoint, so gdb
     stops all threads to start an inline step over for thread 1.
     While stopping threads, some of the threads that were still
     running report events that are also left pending.

#2 - gdb steps thread 1

#3 - Thread 1 exits while stepping (it steps over an exit syscall),
     gdbserver reports thread exit for thread 1

#4 - Thread 1 was the last resumed thread, so gdbserver also reports
     no-resumed:

    [remote]   Notification received: Stop:w0;p3445d0.3445d3
    [remote] Sending packet: $vStopped#55
    [remote] Packet received: N
    [remote] Sending packet: $vStopped#55
    [remote] Packet received: OK

#5 - gdb processes the thread exit for thread 1, finishes the step
     over and restarts threads.

#6 - gdb picks the next event to process out of one of the resumed
     threads with pending events:

    [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11

#7 - This is again a breakpoint hit and the breakpoint needs to be
     stepped over too, so gdb starts a step-over dance again.

#8 - We reach stop_all_threads, which finds that some threads need to
     be stopped.

#9 - wait_one finally consumes the no-resumed event queue by #4.
     Seeing this, wait_one disable target async, to stop listening for
     events out of the remote target.

#10 - We still haven't seen all the stops expected, so
      stop_all_threads tries another iteration.

#11 - Because the remote target is no longer async, and there are no
      other targets, wait_one return no-resumed immediately without
      polling the remote target.

#12 - We still haven't seen all the stops expected, so
      stop_all_threads tries another iteration.  goto #11, looping
      forever.

Fix this by explicitly enabling/re-enabling target async on targets
that can async, before waiting for stops.

Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
---
 gdb/infrun.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 2866962d2dc..31321d758da 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5011,6 +5011,8 @@ wait_one ()
       if (nfds == 0)
 	{
 	  /* No waitable targets left.  All must be stopped.  */
+	  infrun_debug_printf ("no waitable targets left");
+
 	  target_waitstatus ws;
 	  ws.set_no_resumed ();
 	  return {nullptr, minus_one_ptid, std::move (ws)};
@@ -5269,6 +5271,83 @@ handle_one (const wait_one_event &event)
   return false;
 }
 
+/* Helper for stop_all_threads.  wait_one waits for events until it
+   sees a TARGET_WAITKIND_NO_RESUMED event.  When it sees one, it
+   disables target_async for the target to stop waiting for events
+   from it.  TARGET_WAITKIND_NO_RESUMED can be delayed though,
+   consider, debugging against gdbserver:
+
+    #1 - Threads 1-5 are running, and thread 1 hits a breakpoint.
+
+    #2 - gdb processes the breakpoint hit for thread 1, stops all
+	 threads, and steps thread 1 over the breakpoint.  while
+	 stopping threads, some other threads reported interesting
+	 events, which were left pending in the thread's objects
+	 (infrun's queue).
+
+    #2 - Thread 1 exits (it stepped an exit syscall), and gdbserver
+	 reports the thread exit for thread 1.	The event ends up in
+	 remote's stop reply queue.
+
+    #3 - That was the last resumed thread, so gdbserver reports
+	 no-resumed, and that event also ends up in remote's stop
+	 reply queue, queued after the thread exit from #2.
+
+    #4 - gdb processes the thread exit event, which finishes the
+	 step-over, and so gdb restarts all threads (threads with
+	 pending events are left marked resumed, but aren't set
+	 executing).  The no-resumed event is still left pending in
+	 the remote stop reply queue.
+
+    #5 - Since there are now resumed threads with pending breakpoint
+	 hits, gdb picks one at random to process next.
+
+    #5 - gdb picks the breakpoint hit for thread 2 this time, and that
+	 breakpoint also needs to be stepped over, so gdb stops all
+	 threads again.
+
+    #6 - stop_all_threads counts number of expected stops and calls
+	 wait_one once for each.
+
+    #7 - The first wait_one call collects the no-resumed event from #3
+	 above.
+
+    #9 - Seeing the no-resumed event, wait_one disables target async
+	 for the remote target, to stop waiting for events from it.
+	 wait_one from here on always return no-resumed directly
+	 without reaching the target.
+
+    #10 - stop_all_threads still hasn't seen all the stops it expects,
+	  so it does another pass.
+
+    #11 - Since the remote target is not async (disabled in #9),
+	  wait_one doesn't wait on it, so it won't see the expected
+	  stops, and instead returns no-resumed directly.
+
+    #12 - stop_all_threads still haven't seen all the stops, so it
+	  does another pass.  goto #b, looping forever.
+
+   To handle this, we explicitly (re-)enable target async on all
+   targets that can async every time stop_all_threads goes wait for
+   the expected stops.  */
+
+static void
+reenable_target_async ()
+{
+  for (inferior *inf : all_inferiors ())
+    {
+      process_stratum_target *target = inf->process_target ();
+      if (target != nullptr
+	  && target->threads_executing
+	  && target->can_async_p ()
+	  && !target->is_async_p ())
+	{
+	  switch_to_inferior_no_thread (inf);
+	  target_async (1);
+	}
+    }
+}
+
 /* See infrun.h.  */
 
 void
@@ -5395,6 +5474,8 @@ stop_all_threads (const char *reason, inferior *inf)
 	  if (pass > 0)
 	    pass = -1;
 
+	  reenable_target_async ();
+
 	  for (int i = 0; i < waits_needed; i++)
 	    {
 	      wait_one_event event = wait_one ();
-- 
2.36.0


  parent reply	other threads:[~2022-12-12 20:31 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12 20:30 [PATCH 00/31] Step over thread clone and thread exit Pedro Alves
2022-12-12 20:30 ` [PATCH 01/31] displaced step: pass down target_waitstatus instead of gdb_signal Pedro Alves
2023-02-03 10:44   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:07       ` Andrew Burgess
2023-03-22 21:29         ` Andrew Burgess
2023-03-23 15:15           ` Pedro Alves
2023-03-27 12:40             ` Andrew Burgess
2023-03-27 16:21               ` Pedro Alves
2022-12-12 20:30 ` [PATCH 02/31] linux-nat: introduce pending_status_str Pedro Alves
2023-02-03 12:00   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:19       ` Andrew Burgess
2023-03-27 18:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Pedro Alves
2023-03-21 14:50   ` Andrew Burgess
2023-04-04 13:57     ` Pedro Alves
2023-04-14 19:29       ` Pedro Alves
2023-05-26 15:04         ` Andrew Burgess
2023-11-13 14:04           ` Pedro Alves
2023-05-26 14:45       ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 04/31] Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED Pedro Alves
2023-02-04 15:38   ` Andrew Burgess
2023-03-10 17:16     ` Pedro Alves
2023-03-21 16:06       ` Andrew Burgess
2023-11-13 14:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 05/31] Support clone events in the remote protocol Pedro Alves
2023-03-22 15:46   ` Andrew Burgess
2023-11-13 14:05     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 06/31] Avoid duplicate QThreadEvents packets Pedro Alves
2023-05-26 15:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 07/31] enum_flags to_string Pedro Alves
2023-01-30 20:07   ` Simon Marchi
2022-12-12 20:30 ` [PATCH 08/31] Thread options & clone events (core + remote) Pedro Alves
2023-01-31 12:25   ` Lancelot SIX
2023-03-10 19:16     ` Pedro Alves
2023-06-06 13:29       ` Andrew Burgess
2023-11-13 14:07         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 09/31] Thread options & clone events (native Linux) Pedro Alves
2023-06-06 13:43   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 10/31] Thread options & clone events (Linux GDBserver) Pedro Alves
2023-06-06 14:12   ` Andrew Burgess
2023-11-13 14:07     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 11/31] gdbserver: Hide and don't detach pending clone children Pedro Alves
2023-06-07 16:10   ` Andrew Burgess
2023-11-13 14:08     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 12/31] Remove gdb/19675 kfails (displaced stepping + clone) Pedro Alves
2023-06-07 17:08   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 13/31] Add test for stepping over clone syscall Pedro Alves
2023-06-07 17:42   ` Andrew Burgess
2023-11-13 14:09     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 14/31] all-stop/synchronous RSP support thread-exit events Pedro Alves
2023-06-07 17:52   ` Andrew Burgess
2023-11-13 14:11     ` Pedro Alves
2023-12-15 18:15       ` Pedro Alves
2022-12-12 20:30 ` [PATCH 15/31] gdbserver/linux-low.cc: Ignore event_ptid if TARGET_WAITKIND_IGNORE Pedro Alves
2022-12-12 20:30 ` [PATCH 16/31] Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to core Pedro Alves
2023-06-08 12:27   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 17/31] Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exit Pedro Alves
2023-06-08 13:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 18/31] Implement GDB_THREAD_OPTION_EXIT support for Linux GDBserver Pedro Alves
2023-06-08 14:14   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 19/31] Implement GDB_THREAD_OPTION_EXIT support for native Linux Pedro Alves
2023-06-08 14:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 20/31] gdb: clear step over information on thread exit (PR gdb/27338) Pedro Alves
2023-06-08 15:29   ` Andrew Burgess
2022-12-12 20:30 ` Pedro Alves [this message]
2023-06-08 15:49   ` [PATCH 21/31] stop_all_threads: (re-)enable async before waiting for stops Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 22/31] gdbserver: Queue no-resumed event after thread exit Pedro Alves
2023-06-08 18:16   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 23/31] Don't resume new threads if scheduler-locking is in effect Pedro Alves
2023-06-08 18:24   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 24/31] Report thread exit event for leader if reporting thread exit events Pedro Alves
2023-06-09 13:11   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 25/31] Ignore failure to read PC when resuming Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 26/31] gdb/testsuite/lib/my-syscalls.S: Refactor new SYSCALL macro Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 27/31] Testcases for stepping over thread exit syscall (PR gdb/27338) Pedro Alves
2023-06-12  9:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 28/31] Document remote clone events, and QThreadOptions packet Pedro Alves
2023-06-05 15:53   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2023-06-12 12:06   ` Andrew Burgess
2023-11-13 14:15     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 29/31] inferior::clear_thread_list always silent Pedro Alves
2023-06-12 12:20   ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 30/31] Centralize "[Thread ...exited]" notifications Pedro Alves
2023-02-04 16:05   ` Andrew Burgess
2023-03-10 17:21     ` Pedro Alves
2023-02-16 15:40   ` Andrew Burgess
2023-06-12 12:23     ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 31/31] Cancel execution command on thread exit, when stepping, nexting, etc Pedro Alves
2023-06-12 13:12   ` Andrew Burgess
2023-01-24 19:47 ` [PATCH v3 00/31] Step over thread clone and thread exit Pedro Alves
2023-11-13 14:24 ` [PATCH " Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221212203101.1034916-22-pedro@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).