[PATCH 0/9] Fixes for multiprocess for FreeBSD's native target

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target
@ 2023-02-28 18:18 John Baldwin
  2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

When I originally added multiprocess support to the FreeBSD target I
did not fully understand what that meant.  This series seeks to fix
several of my incorrect assumptions (e.g. the need to resume multiple
inferiors for a given call to resume, as well as the need to stop
other inferiors when reporting an event back from wait).

This does not add support for non-stop mode, but probably gets closer
to being able to support it.

In terms of test results, the before summary of a testsuite run on
FreeBSD/amd64 is:

                === gdb Summary ===

# of unexpected core files      11
# of expected passes            98135
# of unexpected failures        3104
# of expected failures          58
# of known failures             125
# of unresolved testcases       74
# of untested testcases         58
# of unsupported tests          818
# of duplicate test names       10

and the summary after this series is:

                === gdb Summary ===

# of unexpected core files      9
# of expected passes            98586
# of unexpected failures        3002
# of expected failures          58
# of known failures             125
# of unresolved testcases       54
# of untested testcases         58
# of unsupported tests          818
# of duplicate test names       9

Some of these other failures are improved further by the other
testsuite patches I posted earlier involving races in tests, or
matching on the output of 'info threads'.

I do see a few failures of the new assertions added in this series
still, namely a handful of tests do something I'm not sure is
expected: they schedule multiple individual threads from a single
process multiple resume() calls.  I think though this may be a test
that is assuming the target supports non-stop when it doesn't.  For
example, here are the debugging logs from a test that I caught that
does this:

Running /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp ...
...
(gdb) file /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
Reading symbols from /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads...
(gdb) builtin_spawn /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
attach 44426
attach: running_lwps 1
Attaching to program: /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads, process 44426
[New LWP 959198 of process 44426]
[New LWP 959404 of process 44426]
[New LWP 959532 of process 44426]
[New LWP 959544 of process 44426]
[New LWP 959568 of process 44426]
...
[Switching to LWP 432744 of process 44426]
_nanosleep () at _nanosleep.S:4
4       PSEUDO(nanosleep)
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
info threads
  Id   Target Id                              Frame 
* 1    LWP 432744 of process 44426            _nanosleep () at _nanosleep.S:4
  2    LWP 959198 of process 44426 "detached" _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
...
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
set breakpoint always-inserted on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
break break_fn
Breakpoint 1 at 0x400ebc: file /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c, line 57.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break break_fn
set debug infrun on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug infrun on
set debug fbsd-nat on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug fbsd-nat on
continue
Continuing.
[infrun] clear_proceed_status_thread: 44426.432744.0
[infrun] clear_proceed_status_thread: 44426.959198.0
[infrun] clear_proceed_status_thread: 44426.959404.0
...
[infrun] clear_proceed_status_thread: 44426.960473.0
[infrun] clear_proceed_status_thread: 44426.960474.0
[infrun] clear_proceed_status_thread: 44426.960475.0
[infrun] proceed: enter
  [infrun] user_visible_resume_ptid: resuming all threads of current process
  [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
  [infrun] scoped_disable_commit_resumed: reason=proceeding
  [infrun] start_step_over: enter
    [infrun] start_step_over: stealing global queue of threads to step, length = 0
    [infrun] operator(): step-over queue now empty
  [infrun] start_step_over: exit
  [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
    [infrun] proceed: resuming 44426.432744.0
    [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.432744.0] at 0x823baf64a
    [infrun] internal_resume_ptid: non_stop uses inferior_ptid
    [infrun] do_target_resume: resume_ptid=44426.432744.0, step=0, sig=GDB_SIGNAL_0
    [fbsd-nat] resume: start: [LWP 432744 of process 44426], step 0, signo 0 (0)
      [fbsd-nat] resume_one_process: [LWP 432744 of process 44426], step 0, signo 0 (0)
    [fbsd-nat] resume: end: [LWP 432744 of process 44426], step 0, signo 0 (0)
    [infrun] infrun_async: enable=1
    [infrun] prepare_to_wait: prepare_to_wait
    [infrun] proceed: resuming 44426.959198.0
    [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.959198.0] at 0x821d32ccc
    [infrun] internal_resume_ptid: non_stop uses inferior_ptid
    [infrun] do_target_resume: resume_ptid=44426.959198.0, step=0, sig=GDB_SIGNAL_0
    [fbsd-nat] resume: start: [LWP 959198 of process 44426], step 0, signo 0 (0)
      [fbsd-nat] resume_one_process: [LWP 959198 of process 44426], step 0, signo 0 (0)
pid 44426: running_lwps 1
/home/john/work/git/gdb/gdb/fbsd-nat.c:1199: internal-error: resume_one_process: Assertion `info->running_lwps == 0' failed.

I've started on a patch to defer actual process resumes until either
wait or commit_resumed is called, but that has so far caused more
regressions than fixes, so I'm not including it here.  Also, the
logged line of 'start: resuming threads, all-stop-on-top-of-non-stop'
when the target is all-stop only makes me think the test is perhaps
broken in this case.

John Baldwin (9):
  fbsd-nat: Add missing spaces.
  fbsd-nat: Avoid a direct write to target_waitstatus::kind.
  fbsd-nat: Use correct constant for target_waitstatus::sig.
  fbsd-nat: Add a list of pending events.
  fbsd-nat: Defer any ineligible events reported by wait.
  fbsd-nat: Fix resuming and waiting with multiple processes.
  fbsd-nat: Fix several issues with detaching.
  fbsd-nat: Fix thread_alive against a running thread.
  fbsd-nat: Stop a process if it is running before killing it.

 gdb/fbsd-nat.c | 892 +++++++++++++++++++++++++++++++++++++++++--------
 gdb/fbsd-nat.h |  14 +
 2 files changed, 769 insertions(+), 137 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] fbsd-nat: Add missing spaces.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 17:43   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

No functional change, just style fixes.
---
 gdb/fbsd-nat.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 27d2fe45092..4e48bfa1590 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -852,23 +852,23 @@ fbsd_enable_proc_events (pid_t pid)
 #ifdef PT_GET_EVENT_MASK
   int events;
 
-  if (ptrace (PT_GET_EVENT_MASK, pid, (PTRACE_TYPE_ARG3)&events,
+  if (ptrace (PT_GET_EVENT_MASK, pid, (PTRACE_TYPE_ARG3) &events,
 	      sizeof (events)) == -1)
     perror_with_name (("ptrace (PT_GET_EVENT_MASK)"));
   events |= PTRACE_FORK | PTRACE_LWP;
 #ifdef PTRACE_VFORK
   events |= PTRACE_VFORK;
 #endif
-  if (ptrace (PT_SET_EVENT_MASK, pid, (PTRACE_TYPE_ARG3)&events,
+  if (ptrace (PT_SET_EVENT_MASK, pid, (PTRACE_TYPE_ARG3) &events,
 	      sizeof (events)) == -1)
     perror_with_name (("ptrace (PT_SET_EVENT_MASK)"));
 #else
 #ifdef TDP_RFPPWAIT
-  if (ptrace (PT_FOLLOW_FORK, pid, (PTRACE_TYPE_ARG3)0, 1) == -1)
+  if (ptrace (PT_FOLLOW_FORK, pid, (PTRACE_TYPE_ARG3) 0, 1) == -1)
     perror_with_name (("ptrace (PT_FOLLOW_FORK)"));
 #endif
 #ifdef PT_LWP_EVENTS
-  if (ptrace (PT_LWP_EVENTS, pid, (PTRACE_TYPE_ARG3)0, 1) == -1)
+  if (ptrace (PT_LWP_EVENTS, pid, (PTRACE_TYPE_ARG3) 0, 1) == -1)
     perror_with_name (("ptrace (PT_LWP_EVENTS)"));
 #endif
 #endif
@@ -1369,7 +1369,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 
 		  gdb_assert (pid == child);
 
-		  if (ptrace (PT_LWPINFO, child, (caddr_t)&pl, sizeof pl) == -1)
+		  if (ptrace (PT_LWPINFO, child, (caddr_t) &pl, sizeof pl) == -1)
 		    perror_with_name (("ptrace (PT_LWPINFO)"));
 
 		  gdb_assert (pl.pl_flags & PL_FLAG_CHILD);
@@ -1489,7 +1489,7 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
      event loop keeps polling until no event is returned.  */
   if (is_async_p ()
       && ((ourstatus->kind () != TARGET_WAITKIND_IGNORE
-	  && ourstatus->kind() != TARGET_WAITKIND_NO_RESUMED)
+	  && ourstatus->kind () != TARGET_WAITKIND_NO_RESUMED)
 	  || ptid != minus_one_ptid))
     async_file_mark ();
 
@@ -1607,7 +1607,7 @@ fbsd_nat_target::follow_fork (inferior *child_inf, ptid_t child_ptid,
       /* Breakpoints have already been detached from the child by
 	 infrun.c.  */
 
-      if (ptrace (PT_DETACH, child_pid, (PTRACE_TYPE_ARG3)1, 0) == -1)
+      if (ptrace (PT_DETACH, child_pid, (PTRACE_TYPE_ARG3) 1, 0) == -1)
 	perror_with_name (("ptrace (PT_DETACH)"));
 
 #ifndef PTRACE_VFORK
@@ -1740,7 +1740,7 @@ fbsd_nat_target::fetch_register_set (struct regcache *regcache, int regnum,
 
   if (regnum == -1
       || (regnum >= regbase && regcache_map_supplies (map, regnum - regbase,
-						      regcache->arch(), size)))
+						      regcache->arch (), size)))
     {
       if (ptrace (fetch_op, pid, (PTRACE_TYPE_ARG3) regs, 0) == -1)
 	perror_with_name (_("Couldn't get registers"));
@@ -1765,7 +1765,7 @@ fbsd_nat_target::store_register_set (struct regcache *regcache, int regnum,
 
   if (regnum == -1
       || (regnum >= regbase && regcache_map_supplies (map, regnum - regbase,
-						      regcache->arch(), size)))
+						      regcache->arch (), size)))
     {
       if (ptrace (fetch_op, pid, (PTRACE_TYPE_ARG3) regs, 0) == -1)
 	perror_with_name (_("Couldn't get registers"));
@@ -1807,7 +1807,7 @@ fbsd_nat_target::fetch_regset (struct regcache *regcache, int regnum, int note,
 
   if (regnum == -1
       || (regnum >= regbase && regcache_map_supplies (map, regnum - regbase,
-						      regcache->arch(), size)))
+						      regcache->arch (), size)))
     {
       struct iovec iov;
 
@@ -1833,7 +1833,7 @@ fbsd_nat_target::store_regset (struct regcache *regcache, int regnum, int note,
 
   if (regnum == -1
       || (regnum >= regbase && regcache_map_supplies (map, regnum - regbase,
-						      regcache->arch(), size)))
+						      regcache->arch (), size)))
     {
       struct iovec iov;
 
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
  2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 17:45   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

This is in #ifdef'd code for a workaround for FreeBSD versions older
than 11.1 which is why it wasn't caught earlier.
---
 gdb/fbsd-nat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 4e48bfa1590..2f5b512fb33 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -1261,7 +1261,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
       wptid = fbsd_next_vfork_done ();
       if (wptid != null_ptid)
 	{
-	  ourstatus->kind = TARGET_WAITKIND_VFORK_DONE;
+	  ourstatus->set_vfork_done ();
 	  return wptid;
 	}
 #endif
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
  2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
  2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 17:47   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

Use GDB_SIGNAL_TRAP instead of SIGTRAP.  This is a no-op since the
value of SIGTRAP on FreeBSD matches the value of GDB_SIGNAL_TRAP, but
it is more correct.
---
 gdb/fbsd-nat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 2f5b512fb33..04d67fc5278 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -1439,7 +1439,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 	     SIGTRAP, so only treat SIGTRAP events as system call
 	     entry/exit events.  */
 	  if (pl.pl_flags & (PL_FLAG_SCE | PL_FLAG_SCX)
-	      && ourstatus->sig () == SIGTRAP)
+	      && ourstatus->sig () == GDB_SIGNAL_TRAP)
 	    {
 #ifdef HAVE_STRUCT_PTRACE_LWPINFO_PL_SYSCALL_CODE
 	      if (catch_syscall_enabled ())
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/9] fbsd-nat: Add a list of pending events.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (2 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 18:35   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

The pending_events list stores a queue of deferred events that might
be reported by the next call to the target's wait method.  The set of
events that are eligible is filtered by the ptid passed to resume.

For now this just replaces the list of vfork_done events.  A
subsequent commit will reuse this to store other events.
---
 gdb/fbsd-nat.c | 146 ++++++++++++++++++++++++++++++++-----------------
 gdb/fbsd-nat.h |   2 +
 2 files changed, 98 insertions(+), 50 deletions(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 04d67fc5278..ca278b871ef 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -54,6 +54,66 @@
 #define	PT_SETREGSET	43	/* Set a target register set */
 #endif
 
+/* Filter for ptid's allowed to report events from wait.  Normally set
+   in resume, but also reset to minus_one_ptid in create_inferior and
+   attach.  */
+
+static ptid_t resume_ptid;
+
+/* If an event is triggered asynchronously (fake vfork_done events) or
+   occurs when the core is not expecting it, a pending event is
+   created.  This event is then returned by a future call to the
+   target wait method.  */
+
+struct pending_event
+{
+  pending_event (const ptid_t &_ptid, const target_waitstatus &_status) :
+    ptid(_ptid), status(_status) {}
+
+  ptid_t ptid;
+  target_waitstatus status;
+};
+
+static std::list<pending_event> pending_events;
+
+/* Add a new pending event to the list.  */
+
+static void
+add_pending_event (const ptid_t &ptid, const target_waitstatus &status)
+{
+  pending_events.emplace_back (ptid, status);
+}
+
+/* Return true if there is a pending event matching FILTER.  */
+
+static bool
+have_pending_event (ptid_t filter)
+{
+  for (const pending_event &event : pending_events)
+    if (event.ptid.matches (filter))
+      return true;
+  return false;
+}
+
+/* Helper method called by the target wait method.  Returns true if
+   there is a pending event matching resume_ptid.  If there is a
+   matching event, PTID and *STATUS contain the event details, and the
+   event is removed from the pending list.  */
+
+static bool
+take_pending_event (ptid_t &ptid, target_waitstatus *status)
+{
+  for (auto it = pending_events.begin (); it != pending_events.end (); it++)
+    if (it->ptid.matches (resume_ptid))
+      {
+	ptid = it->ptid;
+	*status = it->status;
+	pending_events.erase (it);
+	return true;
+      }
+  return false;
+}
+
 /* Return the name of a file that can be opened to get the symbols for
    the child process identified by PID.  */
 
@@ -1061,47 +1121,20 @@ fbsd_is_child_pending (pid_t pid)
 }
 
 #ifndef PTRACE_VFORK
-static std::forward_list<ptid_t> fbsd_pending_vfork_done;
-
 /* Record a pending vfork done event.  */
 
 static void
 fbsd_add_vfork_done (ptid_t pid)
 {
-  fbsd_pending_vfork_done.push_front (pid);
+  target_waitstatus status;
+  status.set_vfork_done ();
+  add_pending_event (ptid, status);
 
   /* If we're in async mode, need to tell the event loop there's
      something here to process.  */
   if (target_is_async_p ())
     async_file_mark ();
 }
-
-/* Check for a pending vfork done event for a specific PID.  */
-
-static int
-fbsd_is_vfork_done_pending (pid_t pid)
-{
-  for (auto it = fbsd_pending_vfork_done.begin ();
-       it != fbsd_pending_vfork_done.end (); it++)
-    if (it->pid () == pid)
-      return 1;
-  return 0;
-}
-
-/* Check for a pending vfork done event.  If one is found, remove it
-   from the list and return the PTID.  */
-
-static ptid_t
-fbsd_next_vfork_done (void)
-{
-  if (!fbsd_pending_vfork_done.empty ())
-    {
-      ptid_t ptid = fbsd_pending_vfork_done.front ();
-      fbsd_pending_vfork_done.pop_front ();
-      return ptid;
-    }
-  return null_ptid;
-}
 #endif
 #endif
 
@@ -1110,21 +1143,18 @@ fbsd_next_vfork_done (void)
 void
 fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
 {
-#if defined(TDP_RFPPWAIT) && !defined(PTRACE_VFORK)
-  pid_t pid;
-
-  /* Don't PT_CONTINUE a process which has a pending vfork done event.  */
-  if (minus_one_ptid == ptid)
-    pid = inferior_ptid.pid ();
-  else
-    pid = ptid.pid ();
-  if (fbsd_is_vfork_done_pending (pid))
-    return;
-#endif
-
   fbsd_nat_debug_printf ("[%s], step %d, signo %d (%s)",
 			 target_pid_to_str (ptid).c_str (), step, signo,
 			 gdb_signal_to_name (signo));
+
+  /* Don't PT_CONTINUE a thread or process which has a pending event.  */
+  resume_ptid = ptid;
+  if (have_pending_event (ptid))
+    {
+      fbsd_nat_debug_printf ("found pending event");
+      return;
+    }
+
   if (ptid.lwp_p ())
     {
       /* If ptid is a specific LWP, suspend all other LWPs in the process.  */
@@ -1257,14 +1287,6 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 
   while (1)
     {
-#ifndef PTRACE_VFORK
-      wptid = fbsd_next_vfork_done ();
-      if (wptid != null_ptid)
-	{
-	  ourstatus->set_vfork_done ();
-	  return wptid;
-	}
-#endif
       wptid = inf_ptrace_target::wait (ptid, ourstatus, target_options);
       if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
 	{
@@ -1478,6 +1500,16 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
   fbsd_nat_debug_printf ("[%s], [%s]", target_pid_to_str (ptid).c_str (),
 			 target_options_to_string (target_options).c_str ());
 
+  /* If there is a valid pending event, return it.  */
+  if (take_pending_event (wptid, ourstatus))
+    {
+      fbsd_nat_debug_printf ("returning pending event [%s], [%s]",
+			     target_pid_to_str (wptid).c_str (),
+			     ourstatus->to_string ().c_str ());
+      gdb_assert (wptid.matches (ptid));
+      return wptid;
+    }
+
   /* Ensure any subsequent events trigger a new event in the loop.  */
   if (is_async_p ())
     async_file_flush ();
@@ -1585,9 +1617,23 @@ fbsd_nat_target::create_inferior (const char *exec_file,
     (disable_randomization);
 #endif
 
+  /* Expect a wait for the new process.  */
+  resume_ptid = minus_one_ptid;
+  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
+			 target_pid_to_str (resume_ptid).c_str ());
   inf_ptrace_target::create_inferior (exec_file, allargs, env, from_tty);
 }
 
+void
+fbsd_nat_target::attach (const char *args, int from_tty)
+{
+  /* Expect a wait for the new process.  */
+  resume_ptid = minus_one_ptid;
+  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
+			 target_pid_to_str (resume_ptid).c_str ());
+  inf_ptrace_target::attach (args, from_tty);
+}
+
 #ifdef TDP_RFPPWAIT
 /* Target hook for follow_fork.  On entry and at return inferior_ptid is
    the ptid of the followed inferior.  */
diff --git a/gdb/fbsd-nat.h b/gdb/fbsd-nat.h
index a19bceaf5e4..cda150ac465 100644
--- a/gdb/fbsd-nat.h
+++ b/gdb/fbsd-nat.h
@@ -76,6 +76,8 @@ class fbsd_nat_target : public inf_ptrace_target
   void create_inferior (const char *, const std::string &,
 			char **, int) override;
 
+  void attach (const char *, int) override;
+
   void resume (ptid_t, int, enum gdb_signal) override;
 
   ptid_t wait (ptid_t, struct target_waitstatus *, target_wait_flags) override;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (3 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 19:09   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

If wait_1 finds an event for a thread or process that does not match
the set of threads and processes previously resumed, defer the event.
If the event is for a specific thread, suspend the thread and continue
the associated process before waiting for another event.

One specific example of such an event is if a thread is created while
another thread in the same process hits a breakpoint.  If the second
thread's event is reported first, the target resume method does not
yet "know" about the new thread and will not suspend it via
PT_SUSPEND.  When wait is called, it will probably return the event
from the first thread before the result of the step from second
thread.  This is the case reported in PR 21497.
---
 gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index ca278b871ef..3f7278c6ea0 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
   if (is_async_p ())
     async_file_flush ();
 
-  wptid = wait_1 (ptid, ourstatus, target_options);
+  while (1)
+    {
+      wptid = wait_1 (ptid, ourstatus, target_options);
+
+      /* If no event was found, just return.  */
+      if (ourstatus->kind () == TARGET_WAITKIND_IGNORE
+	  || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
+	break;
+
+      /* If an event is reported for a thread or process while
+	 stepping some other thread, suspend the thread reporting the
+	 event and defer the event until it can be reported to the
+	 core.  */
+      if (!wptid.matches (resume_ptid))
+	{
+	  add_pending_event (wptid, *ourstatus);
+	  fbsd_nat_debug_printf ("deferring event [%s], [%s]",
+				 target_pid_to_str (wptid).c_str (),
+				 ourstatus->to_string ().c_str ());
+	  if (wptid.pid () == resume_ptid.pid ())
+	    {
+	      fbsd_nat_debug_printf ("suspending thread [%s]",
+				     target_pid_to_str (wptid).c_str ());
+	      if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
+		perror_with_name (("ptrace (PT_SUSPEND)"));
+	      if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
+		perror_with_name (("ptrace (PT_CONTINUE)"));
+	    }
+	  continue;
+	}
+
+      break;
+    }
 
   /* If we are in async mode and found an event, there may still be
      another event pending.  Trigger the event pipe so that that the
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (4 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-20 19:55   ` Simon Marchi
  2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

I did not fully understand the requirements of multiple process
support when I enabled it previously and several parts were broken.
In particular, the resume method was only resuming a single process,
and wait was not stopping other processes when reporting an event.

To support multiple running inferiors, add a new per-inferior
structure which trackes the number of existing and running LWPs for
each process.  The structure also stores a ptid_t describing the
set of LWPs currently resumed for each process.

For the resume method, iterate over all non-exited inferiors resuming
each process matching the passed in ptid rather than only resuming the
current inferior's process for a wildcard ptid.  If a resumed process
has a pending event, don't actually resume the process, but other
matching processes without a pending event are still resumed in case
the later call to the wait method requests an event from one of the
processes without a pending event.

For the wait method, stop other running processes before returning an
event to the core.  When stopping a process, first check to see if an
event is already pending.  If it is, queue the event to be reported
later.  If not, send a SIGSTOP to the process and wait for it to stop.
If the event reported by the wait is not for the SIGSTOP, queue the
event and remember to ignore a future SIGSTOP event for the process.

Note that, unlike the Linux native target, entire processes are
stopped rather than individual LWPs.  In FreeBSD one can only wait on
processes (via pid), not for an event from a specific thread.

Other changes in this commit handle bookkeeping for the per-inferior
data such as purging the data in the mourn_inferior method and
migrating the data to the new inferior in the follow_exec method.  The
per-inferior data is created in the attach, create_inferior, and
follow_fork methods.
---
 gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------
 gdb/fbsd-nat.h |   8 +
 2 files changed, 317 insertions(+), 94 deletions(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 3f7278c6ea0..14b31ddd86e 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -54,11 +54,26 @@
 #define	PT_SETREGSET	43	/* Set a target register set */
 #endif
 
-/* Filter for ptid's allowed to report events from wait.  Normally set
-   in resume, but also reset to minus_one_ptid in create_inferior and
-   attach.  */
+/* Information stored about each inferior.  */
 
-static ptid_t resume_ptid;
+struct fbsd_inferior_info
+{
+  /* Filter for resumed LWPs which can report events from wait.  */
+  ptid_t resumed_lwps = null_ptid;
+
+  /* Number of LWPs this process contains.  */
+  unsigned int num_lwps = 0;
+
+  /* Number of LWPs currently running.  */
+  unsigned int running_lwps = 0;
+
+  /* Have a pending SIGSTOP event that needs to be discarded.  */
+  bool pending_sigstop = false;
+};
+
+/* Per-inferior data key.  */
+
+static const registry<inferior>::key<fbsd_inferior_info> fbsd_inferior_data;
 
 /* If an event is triggered asynchronously (fake vfork_done events) or
    occurs when the core is not expecting it, a pending event is
@@ -95,21 +110,27 @@ have_pending_event (ptid_t filter)
   return false;
 }
 
-/* Helper method called by the target wait method.  Returns true if
-   there is a pending event matching resume_ptid.  If there is a
-   matching event, PTID and *STATUS contain the event details, and the
-   event is removed from the pending list.  */
+/* Returns true if there is a pending event for a resumed process
+   matching FILTER.  If there is a matching event, PTID and *STATUS
+   contain the event details, and the event is removed from the
+   pending list.  */
 
 static bool
-take_pending_event (ptid_t &ptid, target_waitstatus *status)
+take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid,
+		    target_waitstatus *status)
 {
   for (auto it = pending_events.begin (); it != pending_events.end (); it++)
-    if (it->ptid.matches (resume_ptid))
+    if (it->ptid.matches (filter))
       {
-	ptid = it->ptid;
-	*status = it->status;
-	pending_events.erase (it);
-	return true;
+	inferior *inf = find_inferior_ptid (target, it->ptid);
+	fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+	if (it->ptid.matches (info->resumed_lwps))
+	  {
+	    ptid = it->ptid;
+	    *status = it->status;
+	    pending_events.erase (it);
+	    return true;
+	  }
       }
   return false;
 }
@@ -799,6 +820,8 @@ show_fbsd_nat_debug (struct ui_file *file, int from_tty,
 #define fbsd_nat_debug_printf(fmt, ...) \
   debug_prefixed_printf_cond (debug_fbsd_nat, "fbsd-nat", fmt, ##__VA_ARGS__)
 
+#define fbsd_nat_debug_start_end(fmt, ...) \
+  scoped_debug_start_end (debug_fbsd_nat, "fbsd-nat", fmt, ##__VA_ARGS__)
 
 /*
   FreeBSD's first thread support was via a "reentrant" version of libc
@@ -956,6 +979,9 @@ fbsd_add_threads (fbsd_nat_target *target, pid_t pid)
   if (nlwps == -1)
     perror_with_name (("ptrace (PT_GETLWPLIST)"));
 
+  inferior *inf = find_inferior_ptid (target, ptid_t (pid));
+  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+  gdb_assert (info != nullptr);
   for (i = 0; i < nlwps; i++)
     {
       ptid_t ptid = ptid_t (pid, lwps[i]);
@@ -974,8 +1000,14 @@ fbsd_add_threads (fbsd_nat_target *target, pid_t pid)
 #endif
 	  fbsd_lwp_debug_printf ("adding thread for LWP %u", lwps[i]);
 	  add_thread (target, ptid);
+#ifdef PT_LWP_EVENTS
+	  info->num_lwps++;
+#endif
 	}
     }
+#ifndef PT_LWP_EVENTS
+  info->num_lwps = nlwps;
+#endif
 }
 
 /* Implement the "update_thread_list" target_ops method.  */
@@ -1138,92 +1170,117 @@ fbsd_add_vfork_done (ptid_t pid)
 #endif
 #endif
 
-/* Implement the "resume" target_ops method.  */
+/* Resume a single process.  */
 
 void
-fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
+fbsd_nat_target::resume_one_process (ptid_t ptid, int step,
+				     enum gdb_signal signo)
 {
   fbsd_nat_debug_printf ("[%s], step %d, signo %d (%s)",
 			 target_pid_to_str (ptid).c_str (), step, signo,
 			 gdb_signal_to_name (signo));
 
+  inferior *inf = find_inferior_ptid (this, ptid);
+  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+  info->resumed_lwps = ptid;
+  gdb_assert (info->running_lwps == 0);
+
   /* Don't PT_CONTINUE a thread or process which has a pending event.  */
-  resume_ptid = ptid;
   if (have_pending_event (ptid))
     {
       fbsd_nat_debug_printf ("found pending event");
       return;
     }
 
-  if (ptid.lwp_p ())
+  for (thread_info *tp : inf->non_exited_threads ())
     {
-      /* If ptid is a specific LWP, suspend all other LWPs in the process.  */
-      inferior *inf = find_inferior_ptid (this, ptid);
+      int request;
 
-      for (thread_info *tp : inf->non_exited_threads ())
-	{
-	  int request;
-
-	  if (tp->ptid.lwp () == ptid.lwp ())
-	    request = PT_RESUME;
-	  else
-	    request = PT_SUSPEND;
-
-	  if (ptrace (request, tp->ptid.lwp (), NULL, 0) == -1)
-	    perror_with_name (request == PT_RESUME ?
-			      ("ptrace (PT_RESUME)") :
-			      ("ptrace (PT_SUSPEND)"));
-	  if (request == PT_RESUME)
-	    low_prepare_to_resume (tp);
-	}
-    }
-  else
-    {
-      /* If ptid is a wildcard, resume all matching threads (they won't run
-	 until the process is continued however).  */
-      for (thread_info *tp : all_non_exited_threads (this, ptid))
+      /* If ptid is a specific LWP, suspend all other LWPs in the
+	 process, otherwise resume all LWPs in the process..  */
+      if (!ptid.lwp_p() || tp->ptid.lwp () == ptid.lwp ())
 	{
 	  if (ptrace (PT_RESUME, tp->ptid.lwp (), NULL, 0) == -1)
 	    perror_with_name (("ptrace (PT_RESUME)"));
 	  low_prepare_to_resume (tp);
+	  info->running_lwps++;
 	}
+      else
+	{
+	  if (ptrace (PT_SUSPEND, tp->ptid.lwp (), NULL, 0) == -1)
+	    perror_with_name (("ptrace (PT_SUSPEND)"));
+	}
+    }
+
+  if (ptid.pid () != inferior_ptid.pid ())
+    {
+      step = 0;
+      signo = GDB_SIGNAL_0;
+      gdb_assert (!ptid.lwp_p ());
+    }
+  else
+    {
       ptid = inferior_ptid;
-    }
-
 #if __FreeBSD_version < 1200052
-  /* When multiple threads within a process wish to report STOPPED
-     events from wait(), the kernel picks one thread event as the
-     thread event to report.  The chosen thread event is retrieved via
-     PT_LWPINFO by passing the process ID as the request pid.  If
-     multiple events are pending, then the subsequent wait() after
-     resuming a process will report another STOPPED event after
-     resuming the process to handle the next thread event and so on.
+      /* When multiple threads within a process wish to report STOPPED
+	 events from wait(), the kernel picks one thread event as the
+	 thread event to report.  The chosen thread event is retrieved
+	 via PT_LWPINFO by passing the process ID as the request pid.
+	 If multiple events are pending, then the subsequent wait()
+	 after resuming a process will report another STOPPED event
+	 after resuming the process to handle the next thread event
+	 and so on.
 
-     A single thread event is cleared as a side effect of resuming the
-     process with PT_CONTINUE, PT_STEP, etc.  In older kernels,
-     however, the request pid was used to select which thread's event
-     was cleared rather than always clearing the event that was just
-     reported.  To avoid clearing the event of the wrong LWP, always
-     pass the process ID instead of an LWP ID to PT_CONTINUE or
-     PT_SYSCALL.
+	 A single thread event is cleared as a side effect of resuming
+	 the process with PT_CONTINUE, PT_STEP, etc.  In older
+	 kernels, however, the request pid was used to select which
+	 thread's event was cleared rather than always clearing the
+	 event that was just reported.  To avoid clearing the event of
+	 the wrong LWP, always pass the process ID instead of an LWP
+	 ID to PT_CONTINUE or PT_SYSCALL.
 
-     In the case of stepping, the process ID cannot be used with
-     PT_STEP since it would step the thread that reported an event
-     which may not be the thread indicated by PTID.  For stepping, use
-     PT_SETSTEP to enable stepping on the desired thread before
-     resuming the process via PT_CONTINUE instead of using
-     PT_STEP.  */
-  if (step)
-    {
-      if (ptrace (PT_SETSTEP, get_ptrace_pid (ptid), NULL, 0) == -1)
-	perror_with_name (("ptrace (PT_SETSTEP)"));
-      step = 0;
-    }
-  ptid = ptid_t (ptid.pid ());
+	 In the case of stepping, the process ID cannot be used with
+	 PT_STEP since it would step the thread that reported an event
+	 which may not be the thread indicated by PTID.  For stepping,
+	 use PT_SETSTEP to enable stepping on the desired thread
+	 before resuming the process via PT_CONTINUE instead of using
+	 PT_STEP.  */
+      if (step)
+	{
+	  if (ptrace (PT_SETSTEP, get_ptrace_pid (ptid), NULL, 0) == -1)
+	    perror_with_name (("ptrace (PT_SETSTEP)"));
+	  step = 0;
+	}
+      ptid = ptid_t (ptid.pid ());
 #endif
+    }
+
   inf_ptrace_target::resume (ptid, step, signo);
 }
 
+/* Implement the "resume" target_ops method.  */
+
+void
+fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
+{
+  fbsd_nat_debug_start_end ("[%s], step %d, signo %d (%s)",
+			    target_pid_to_str (ptid).c_str (), step, signo,
+			    gdb_signal_to_name (signo));
+
+  gdb_assert (inferior_ptid.matches (ptid));
+  gdb_assert (!ptid.tid_p ());
+
+  if (ptid == minus_one_ptid)
+    {
+      for (inferior *inf : all_non_exited_inferiors (this))
+	resume_one_process (ptid_t (inf->pid), step, signo);
+    }
+  else
+    {
+      resume_one_process (ptid, step, signo);
+    }
+}
+
 #ifdef USE_SIGTRAP_SIGINFO
 /* Handle breakpoint and trace traps reported via SIGTRAP.  If the
    trap was a breakpoint or trace trap that should be reported to the
@@ -1291,10 +1348,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
       if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
 	{
 	  struct ptrace_lwpinfo pl;
-	  pid_t pid;
-	  int status;
-
-	  pid = wptid.pid ();
+	  pid_t pid = wptid.pid ();
 	  if (ptrace (PT_LWPINFO, pid, (caddr_t) &pl, sizeof pl) == -1)
 	    perror_with_name (("ptrace (PT_LWPINFO)"));
 
@@ -1310,6 +1364,13 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 				       pl.pl_siginfo.si_code);
 	    }
 
+	  /* There may not be an inferior for this pid if this is a
+	     PL_FLAG_CHILD event.  */
+	  inferior *inf = find_inferior_ptid (this, wptid);
+	  fbsd_inferior_info *info = inf == nullptr ? nullptr :
+	    fbsd_inferior_data.get (inf);
+	  gdb_assert (info != NULL || pl.pl_flags & PL_FLAG_CHILD);
+
 #ifdef PT_LWP_EVENTS
 	  if (pl.pl_flags & PL_FLAG_EXITED)
 	    {
@@ -1327,6 +1388,20 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 				target_pid_to_str (wptid).c_str ());
 		  low_delete_thread (thr);
 		  delete_thread (thr);
+		  info->num_lwps--;
+
+		  /* If this LWP was the only resumed LWP from the
+		     process, report an event to the core.  */
+		  if (wptid == info->resumed_lwps)
+		    {
+		      ourstatus->set_spurious ();
+		      return wptid;
+		    }
+
+		  /* During process exit LWPs that were not resumed
+		     will report exit events.  */
+		  if (wptid.matches (info->resumed_lwps))
+		    info->running_lwps--;
 		}
 	      if (ptrace (PT_CONTINUE, pid, (caddr_t) 1, 0) == -1)
 		perror_with_name (("ptrace (PT_CONTINUE)"));
@@ -1359,6 +1434,10 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 		  fbsd_lwp_debug_printf ("adding thread for LWP %u",
 					 pl.pl_lwpid);
 		  add_thread (this, wptid);
+		  info->num_lwps++;
+
+		  if (wptid.matches(info->resumed_lwps))
+		    info->running_lwps++;
 		}
 	      ourstatus->set_spurious ();
 	      return wptid;
@@ -1385,6 +1464,8 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 	      child_ptid = fbsd_is_child_pending (child);
 	      if (child_ptid == null_ptid)
 		{
+		  int status;
+
 		  pid = waitpid (child, &status, 0);
 		  if (pid == -1)
 		    perror_with_name (("waitpid"));
@@ -1486,11 +1567,99 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 		perror_with_name (("ptrace (PT_CONTINUE)"));
 	      continue;
 	    }
+
+	  /* If this is a pending SIGSTOP event from an earlier call
+	     to stop_process, discard the event and wait for another
+	     event.  */
+	  if (ourstatus->sig () == GDB_SIGNAL_STOP && info->pending_sigstop)
+	    {
+	      fbsd_nat_debug_printf ("ignoring SIGSTOP for pid %u", pid);
+	      info->pending_sigstop = false;
+	      if (ptrace (PT_CONTINUE, pid, (caddr_t) 1, 0) == -1)
+		perror_with_name (("ptrace (PT_CONTINUE)"));
+	      continue;
+	    }
 	}
+      else
+	fbsd_nat_debug_printf ("event [%s], [%s]",
+			       target_pid_to_str (wptid).c_str (),
+			       ourstatus->to_string ().c_str ());
       return wptid;
     }
 }
 
+/* Stop a given process.  If the process is already stopped, record
+   its pending event instead.  */
+
+void
+fbsd_nat_target::stop_process (inferior *inf)
+{
+  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+  gdb_assert (info != nullptr);
+
+  info->resumed_lwps = null_ptid;
+  if (info->running_lwps == 0)
+    return;
+
+  ptid_t ptid (inf->pid);
+  target_waitstatus status;
+  ptid_t wptid = wait_1 (ptid, &status, TARGET_WNOHANG);
+
+  if (wptid != minus_one_ptid)
+    {
+      /* Save the current event as a pending event.  */
+      add_pending_event (wptid, status);
+      info->running_lwps = 0;
+      return;
+    }
+
+  /* If a SIGSTOP is already pending, don't send a new one, but tell
+     wait_1 to report a SIGSTOP.  */
+  if (info->pending_sigstop)
+    {
+      fbsd_nat_debug_printf ("waiting for existing pending SIGSTOP for %u",
+			     inf->pid);
+      info->pending_sigstop = false;
+    }
+  else
+    {
+      /* Ignore errors from kill as process exit might race with kill.  */
+      fbsd_nat_debug_printf ("killing %u with SIGSTOP", inf->pid);
+      ::kill (inf->pid, SIGSTOP);
+    }
+
+  /* Wait for SIGSTOP (or some other event) to be reported.  */
+  wptid = wait_1 (ptid, &status, 0);
+
+  switch (status.kind ())
+    {
+    case TARGET_WAITKIND_EXITED:
+    case TARGET_WAITKIND_SIGNALLED:
+      /* If the process has exited, we aren't going to get an
+	 event for the SIGSTOP.  Save the current event and
+	 return.  */
+      add_pending_event (wptid, status);
+      break;
+    case TARGET_WAITKIND_STOPPED:
+      /* If this is the SIGSTOP event, discard it and return
+	 leaving the process stopped.  */
+      if (status.sig () == GDB_SIGNAL_STOP)
+	break;
+
+      /* FALLTHROUGH */
+    default:
+      /* Some other event has occurred.  Save the current
+	 event.  */
+      add_pending_event (wptid, status);
+
+      /* Ignore the next SIGSTOP for this process.  */
+      fbsd_nat_debug_printf ("ignoring next SIGSTOP for %u", inf->pid);
+      info->pending_sigstop = true;
+      break;
+    }
+  info->running_lwps = 0;
+}
+
 ptid_t
 fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 		       target_wait_flags target_options)
@@ -1501,8 +1670,12 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 			 target_options_to_string (target_options).c_str ());
 
   /* If there is a valid pending event, return it.  */
-  if (take_pending_event (wptid, ourstatus))
+  if (take_pending_event (this, ptid, wptid, ourstatus))
     {
+      /* Stop any other inferiors currently running.  */
+      for (inferior *inf : all_non_exited_inferiors (this))
+	stop_process (inf);
+
       fbsd_nat_debug_printf ("returning pending event [%s], [%s]",
 			     target_pid_to_str (wptid).c_str (),
 			     ourstatus->to_string ().c_str ());
@@ -1523,28 +1696,38 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 	  || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
 	break;
 
+      inferior *inf = find_inferior_ptid (this, wptid);
+      gdb_assert (inf != nullptr);
+      fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+      gdb_assert (info != nullptr);
+      gdb_assert (info->resumed_lwps != null_ptid);
+      gdb_assert (info->running_lwps > 0);
+
       /* If an event is reported for a thread or process while
 	 stepping some other thread, suspend the thread reporting the
 	 event and defer the event until it can be reported to the
 	 core.  */
-      if (!wptid.matches (resume_ptid))
+      if (!wptid.matches (info->resumed_lwps))
 	{
 	  add_pending_event (wptid, *ourstatus);
 	  fbsd_nat_debug_printf ("deferring event [%s], [%s]",
 				 target_pid_to_str (wptid).c_str (),
 				 ourstatus->to_string ().c_str ());
-	  if (wptid.pid () == resume_ptid.pid ())
-	    {
-	      fbsd_nat_debug_printf ("suspending thread [%s]",
-				     target_pid_to_str (wptid).c_str ());
-	      if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
-		perror_with_name (("ptrace (PT_SUSPEND)"));
-	      if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
-		perror_with_name (("ptrace (PT_CONTINUE)"));
-	    }
+	  if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
+	    perror_with_name (("ptrace (PT_SUSPEND)"));
+	  if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
+	    perror_with_name (("ptrace (PT_CONTINUE)"));
 	  continue;
 	}
 
+      /* This process is no longer running.  */
+      info->resumed_lwps = null_ptid;
+      info->running_lwps = 0;
+
+      /* Stop any other inferiors currently running.  */
+      for (inferior *inf : all_non_exited_inferiors (this))
+	stop_process (inf);
+
       break;
     }
 
@@ -1649,23 +1832,49 @@ fbsd_nat_target::create_inferior (const char *exec_file,
     (disable_randomization);
 #endif
 
-  /* Expect a wait for the new process.  */
-  resume_ptid = minus_one_ptid;
-  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
-			 target_pid_to_str (resume_ptid).c_str ());
+  fbsd_inferior_info *info = fbsd_inferior_data.emplace (current_inferior ());
+  info->resumed_lwps = minus_one_ptid;
+  info->num_lwps = 1;
+  info->running_lwps = 1;
   inf_ptrace_target::create_inferior (exec_file, allargs, env, from_tty);
 }
 
 void
 fbsd_nat_target::attach (const char *args, int from_tty)
 {
-  /* Expect a wait for the new process.  */
-  resume_ptid = minus_one_ptid;
-  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
-			 target_pid_to_str (resume_ptid).c_str ());
+  fbsd_inferior_info *info = fbsd_inferior_data.emplace (current_inferior ());
+  info->resumed_lwps = minus_one_ptid;
+  info->num_lwps = 1;
+  info->running_lwps = 1;
   inf_ptrace_target::attach (args, from_tty);
 }
 
+void
+fbsd_nat_target::mourn_inferior ()
+{
+  inferior *inf = current_inferior ();
+  gdb_assert (!have_pending_event (ptid_t (inf->pid)));
+  fbsd_inferior_data.clear (inf);
+  inf_ptrace_target::mourn_inferior ();
+}
+
+void
+fbsd_nat_target::follow_exec (inferior *follow_inf, ptid_t ptid,
+			      const char *execd_pathname)
+{
+  inferior *orig_inf = current_inferior ();
+
+  inf_ptrace_target::follow_exec (follow_inf, ptid, execd_pathname);
+
+  if (orig_inf != follow_inf)
+    {
+      /* Migrate the inferior info to the new inferior. */
+      fbsd_inferior_info *info = fbsd_inferior_data.get (orig_inf);
+      fbsd_inferior_data.set (orig_inf, nullptr);
+      fbsd_inferior_data.set (follow_inf, info);
+    }
+}
+
 #ifdef TDP_RFPPWAIT
 /* Target hook for follow_fork.  On entry and at return inferior_ptid is
    the ptid of the followed inferior.  */
@@ -1678,6 +1887,12 @@ fbsd_nat_target::follow_fork (inferior *child_inf, ptid_t child_ptid,
   inf_ptrace_target::follow_fork (child_inf, child_ptid, fork_kind,
 				  follow_child, detach_fork);
 
+  if (child_inf != nullptr)
+    {
+      fbsd_inferior_info *info = fbsd_inferior_data.emplace (child_inf);
+      info->num_lwps = 1;
+    }
+
   if (!follow_child && detach_fork)
     {
       pid_t child_pid = child_ptid.pid ();
diff --git a/gdb/fbsd-nat.h b/gdb/fbsd-nat.h
index cda150ac465..97856573962 100644
--- a/gdb/fbsd-nat.h
+++ b/gdb/fbsd-nat.h
@@ -78,6 +78,8 @@ class fbsd_nat_target : public inf_ptrace_target
 
   void attach (const char *, int) override;
 
+  void mourn_inferior () override;
+
   void resume (ptid_t, int, enum gdb_signal) override;
 
   ptid_t wait (ptid_t, struct target_waitstatus *, target_wait_flags) override;
@@ -89,6 +91,8 @@ class fbsd_nat_target : public inf_ptrace_target
   bool stopped_by_sw_breakpoint () override;
 #endif
 
+  void follow_exec (inferior *, ptid_t, const char *) override;
+
 #ifdef TDP_RFPPWAIT
   void follow_fork (inferior *, ptid_t, target_waitkind, bool, bool) override;
 
@@ -132,6 +136,10 @@ class fbsd_nat_target : public inf_ptrace_target
 private:
   ptid_t wait_1 (ptid_t, struct target_waitstatus *, target_wait_flags);
 
+  void resume_one_process (ptid_t, int, enum gdb_signal);
+
+  void stop_process (inferior *);
+
   /* Helper routines for use in fetch_registers and store_registers in
      subclasses.  These routines fetch and store a single set of
      registers described by REGSET.  The REGSET's 'regmap' field must
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 7/9] fbsd-nat: Fix several issues with detaching.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (5 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

- Detach from any child processes implicitly attached to by the kernel
  due to fork following that have not yet been processed by GDB's
  core.

- Delete breakpoints before detaching.

  inf-ptrace::detach does not do this (somewhat surprisingly), so add
  an override to remove breakpoints from a process before detaching
  from it.

  This also requires explicitly draining any pending SIGTRAP events
  for software breakpoints before detaching.  In particular, threads
  may need their PC adjusted due to the software breakpoint before
  being resumed after detach.  On more modern systems using the si_code
  from SIGTRAP to identify software breakpoint traps, the PC is adjusted
  in ::wait_1 as a side effect of parsing the event.  To support older
  kernels, ::detach fixes up the PC for any SIGTRAP stop whose potential
  new PC matches an existing software breakpoint.
---
 gdb/fbsd-nat.c | 260 +++++++++++++++++++++++++++++++++++++++++++++++++
 gdb/fbsd-nat.h |   2 +
 2 files changed, 262 insertions(+)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 14b31ddd86e..e3d83cbf908 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -1849,6 +1849,266 @@ fbsd_nat_target::attach (const char *args, int from_tty)
   inf_ptrace_target::attach (args, from_tty);
 }
 
+/* If this thread has a pending fork event, there is a child process
+   GDB is attached to that the core of GDB doesn't know about.
+   Detach from it.  */
+
+static void
+detach_fork_children (thread_info *tp)
+{
+  /* Check in thread_info::pending_waitstatus.  */
+  if (tp->has_pending_waitstatus ())
+    {
+      const target_waitstatus &ws = tp->pending_waitstatus ();
+
+      if (ws.kind () == TARGET_WAITKIND_VFORKED
+	  || ws.kind () == TARGET_WAITKIND_FORKED)
+	{
+	  pid_t pid = ws.child_ptid ().pid ();
+	  fbsd_nat_debug_printf ("detaching from child %d", pid);
+	  (void) ptrace (PT_DETACH, pid, (caddr_t) 1, 0);
+	}
+    }
+
+  /* Check in thread_info::pending_follow.  */
+  if (tp->pending_follow.kind () == TARGET_WAITKIND_VFORKED
+      || tp->pending_follow.kind () == TARGET_WAITKIND_FORKED)
+    {
+      pid_t pid = tp->pending_follow.child_ptid ().pid ();
+      fbsd_nat_debug_printf ("detaching from child %d", pid);
+      (void) ptrace (PT_DETACH, pid, (caddr_t) 1, 0);
+    }
+}
+
+/* Detach from any child processes associated with pending fork events
+   for a stopped process.  Returns true if the process has terminated
+   and false if it is still alive.  */
+
+static bool
+detach_fork_children (fbsd_nat_target *target, inferior *inf)
+{
+  /* Detach any child processes associated with pending fork events in
+     threads belonging to this process.  */
+  for (thread_info *tp : inf->non_exited_threads ())
+    detach_fork_children (tp);
+
+  /* Unwind state associated with any pending events.  Reset
+     info->resumed_lwps so that take_pending_event will harvest
+     events.  */
+  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+  ptid_t ptid = ptid_t (inf->pid);
+  info->resumed_lwps = ptid;
+
+  ptid_t wptid;
+  target_waitstatus ws;
+  while (take_pending_event (target, ptid, wptid, &ws))
+    {
+      switch (ws.kind ())
+	{
+	case TARGET_WAITKIND_EXITED:
+	case TARGET_WAITKIND_SIGNALLED:
+	  return true;
+	case TARGET_WAITKIND_FORKED:
+	case TARGET_WAITKIND_VFORKED:
+	  {
+	    pid_t pid = ws.child_ptid ().pid ();
+	    fbsd_nat_debug_printf ("detaching from child %d", pid);
+	    (void) ptrace (PT_DETACH, pid, (caddr_t) 1, 0);
+	  }
+	  break;
+	}
+    }
+  return false;
+}
+
+/* Scan all of the threads for a stopped process invoking the supplied
+   callback on the ptrace_lwpinfo object for threads other than the
+   thread which reported the current stop.  The callback can return
+   true to terminate the iteration early.  This function returns true
+   if the callback returned true, otherwise it returns false.  */
+
+typedef bool (ptrace_event_ftype) (const struct ptrace_lwpinfo &pl);
+
+static bool
+iterate_other_ptrace_events (pid_t pid,
+			     gdb::function_view<ptrace_event_ftype> callback)
+{
+  /* Fetch the LWP ID of the thread that just reported the last stop
+     and ignore that LWP in the following loop.  */
+  ptrace_lwpinfo pl;
+  if (ptrace (PT_LWPINFO, pid, (caddr_t) &pl, sizeof (pl)) != 0)
+    perror_with_name (("ptrace (PT_LWPINFO)"));
+  lwpid_t lwpid = pl.pl_lwpid;
+
+  int nlwps = ptrace (PT_GETNUMLWPS, pid, NULL, 0);
+  if (nlwps == -1)
+    perror_with_name (("ptrace (PT_GETLWPLIST)"));
+  if (nlwps == 1)
+    return false;
+
+  gdb::unique_xmalloc_ptr<lwpid_t[]> lwps (XCNEWVEC (lwpid_t, nlwps));
+
+  nlwps = ptrace (PT_GETLWPLIST, pid, (caddr_t) lwps.get (), nlwps);
+  if (nlwps == -1)
+    perror_with_name (("ptrace (PT_GETLWPLIST)"));
+
+  for (int i = 0; i < nlwps; i++)
+    {
+      if (lwps[i] == lwpid)
+	continue;
+
+      if (ptrace (PT_LWPINFO, lwps[i], (caddr_t) &pl, sizeof (pl)) != 0)
+	perror_with_name (("ptrace (PT_LWPINFO)"));
+
+      if (callback (pl))
+	return true;
+    }
+  return false;
+}
+
+/* True if there are any stopped threads with an interesting event.  */
+
+static bool
+pending_ptrace_events (inferior *inf)
+{
+  auto lambda = [] (const struct ptrace_lwpinfo &pl)
+  {
+#ifdef PT_LWP_EVENTS
+    if (pl.pl_flags == PL_FLAG_BORN)
+      return true;
+#endif
+#ifdef TDP_RFPPWAIT
+    if (pl.pl_flags & PL_FLAG_FORKED)
+      return true;
+#endif
+    if (pl.pl_event == PL_EVENT_SIGNAL)
+      {
+	if ((pl.pl_flags & PL_FLAG_SI) == 0)
+	  {
+	    /* Not sure which signal, assume it matters.  */
+	    return true;
+	  }
+	if (pl.pl_siginfo.si_signo == SIGTRAP)
+	  return true;
+      }
+    return false;
+  };
+  return iterate_other_ptrace_events (inf->pid,
+				      gdb::make_function_view (lambda));
+}
+
+void
+fbsd_nat_target::detach (inferior *inf, int from_tty)
+{
+  fbsd_nat_debug_start_end ("pid %d", inf->pid);
+
+  stop_process (inf);
+
+  remove_breakpoints_inf (inf);
+
+  if (detach_fork_children (this, inf)) {
+    /* No need to detach now.  */
+    target_announce_detach (from_tty);
+
+    detach_success (inf);
+    return;
+  }
+
+  /* If there are any pending events (SIGSTOP from stop_process or a
+     breakpoint hit that needs a PC fixup), drain events until the
+     process can be safely detached.  */
+  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
+  ptid_t ptid = ptid_t (inf->pid);
+  if (info->pending_sigstop || pending_ptrace_events (inf))
+    {
+      bool pending_sigstop = info->pending_sigstop;
+      int sig = 0;
+
+      if (pending_sigstop)
+	fbsd_nat_debug_printf ("waiting for SIGSTOP");
+
+      /* Force wait_1 to report the SIGSTOP instead of swallowing it.  */
+      info->pending_sigstop = false;
+
+      /* Report event for all threads from wait_1.  */
+      info->resumed_lwps = ptid;
+
+      do
+	{
+	  if (ptrace (PT_CONTINUE, inf->pid, (caddr_t) 1, sig) != 0)
+	    perror_with_name (("ptrace(PT_CONTINUE)"));
+
+	  target_waitstatus ws;
+	  ptid_t wptid = wait_1 (ptid, &ws, 0);
+
+	  switch (ws.kind ())
+	    {
+	    case TARGET_WAITKIND_EXITED:
+	    case TARGET_WAITKIND_SIGNALLED:
+	      /* No need to detach now.  */
+	      target_announce_detach (from_tty);
+
+	      detach_success (inf);
+	      return;
+	    case TARGET_WAITKIND_FORKED:
+	    case TARGET_WAITKIND_VFORKED:
+	      {
+		pid_t pid = ws.child_ptid ().pid ();
+		fbsd_nat_debug_printf ("detaching from child %d", pid);
+		(void) ptrace (PT_DETACH, pid, (caddr_t) 1, 0);
+		sig = 0;
+	      }
+	      break;
+	    case TARGET_WAITKIND_STOPPED:
+	      sig = gdb_signal_to_host (ws.sig ());
+	      switch (sig)
+		{
+		case SIGSTOP:
+		  if (pending_sigstop)
+		    {
+		      sig = 0;
+		      pending_sigstop = false;
+		    }
+		  break;
+		case SIGTRAP:
+#ifndef USE_SIGTRAP_SIGINFO
+		  {
+		    /* Update PC from software breakpoint hit.  */
+		    struct regcache *regcache = get_thread_regcache (this, wptid);
+		    struct gdbarch *gdbarch = regcache->arch ();
+		    int decr_pc = gdbarch_decr_pc_after_break (gdbarch);
+
+		    if (decr_pc != 0)
+		      {
+			CORE_ADDR pc;
+
+			pc = regcache_read_pc (regcache);
+			if (breakpoint_inserted_here_p (regcache->aspace (),
+							pc - decr_pc))
+			  {
+			    fbsd_nat_debug_printf ("adjusted PC for LWP %ld",
+						   wptid.lwp ());
+			    regcache_write_pc (regcache, pc - decr_pc);
+			  }
+		      }
+		  }
+#endif
+		  sig = 0;
+		  break;
+		}
+	    }
+	}
+      while (pending_sigstop || pending_ptrace_events (inf));
+    }
+
+  target_announce_detach (from_tty);
+
+  if (ptrace (PT_DETACH, inf->pid, (caddr_t) 1, 0) == -1)
+	perror_with_name (("ptrace (PT_DETACH)"));
+
+  detach_success (inf);
+}
+
 void
 fbsd_nat_target::mourn_inferior ()
 {
diff --git a/gdb/fbsd-nat.h b/gdb/fbsd-nat.h
index 97856573962..3dc22ce1cca 100644
--- a/gdb/fbsd-nat.h
+++ b/gdb/fbsd-nat.h
@@ -78,6 +78,8 @@ class fbsd_nat_target : public inf_ptrace_target
 
   void attach (const char *, int) override;
 
+  void detach (inferior *, int) override;
+
   void mourn_inferior () override;
 
   void resume (ptid_t, int, enum gdb_signal) override;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (6 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
  2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
  9 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

FreeBSD's ptrace fails requests with EBUSY against a running process.
Report that the thread is alive instead of dead if ptrace fails with
EBUSY.

This fixes an internal error in the gdb.threads/detach-step-over.exp
test where one process was detached while a thread in a second process
was being stepped.  The core incorrectly assumed the stepping thread
had vanished and discarded the pending stepping state.  When the
thread later reported a SIGTRAP from completing the step, this
triggered an assertion.
---
 gdb/fbsd-nat.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index e3d83cbf908..32dd482ec3c 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -861,7 +861,13 @@ fbsd_nat_target::thread_alive (ptid_t ptid)
 
       if (ptrace (PT_LWPINFO, ptid.lwp (), (caddr_t) &pl, sizeof pl)
 	  == -1)
-	return false;
+	{
+	  /* EBUSY means the associated process is running which means
+	     the LWP does exist and belongs to a running process.  */
+	  if (errno == EBUSY)
+	    return true;
+	  return false;
+	}
 #ifdef PL_FLAG_EXITED
       if (pl.pl_flags & PL_FLAG_EXITED)
 	return false;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it.
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (7 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
@ 2023-02-28 18:18 ` John Baldwin
  2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
  9 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-02-28 18:18 UTC (permalink / raw)
  To: gdb-patches

In addition, detach from any child processes implicitly attached to by
the kernel due to fork following that have not yet been processed by
GDB's core.
---
 gdb/fbsd-nat.c | 93 +++++++++++++++++++++++++++++++++++++++++---------
 gdb/fbsd-nat.h |  2 ++
 2 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
index 32dd482ec3c..42c69875b0f 100644
--- a/gdb/fbsd-nat.c
+++ b/gdb/fbsd-nat.c
@@ -1158,6 +1158,31 @@ fbsd_is_child_pending (pid_t pid)
   return null_ptid;
 }
 
+/* Wait for a child of a fork to report its stop.  Returns the PTID of
+   the new child process.  */
+
+static ptid_t
+fbsd_wait_for_fork_child (pid_t pid)
+{
+  ptid_t ptid = fbsd_is_child_pending (pid);
+  if (ptid != null_ptid)
+    return ptid;
+
+  int status;
+  pid_t wpid = waitpid (pid, &status, 0);
+  if (wpid == -1)
+    perror_with_name (("waitpid"));
+
+  gdb_assert (wpid == pid);
+
+  struct ptrace_lwpinfo pl;
+  if (ptrace (PT_LWPINFO, wpid, (caddr_t) &pl, sizeof pl) == -1)
+    perror_with_name (("ptrace (PT_LWPINFO)"));
+
+  gdb_assert (pl.pl_flags & PL_FLAG_CHILD);
+  return ptid_t (wpid, pl.pl_lwpid);
+}
+
 #ifndef PTRACE_VFORK
 /* Record a pending vfork done event.  */
 
@@ -1467,23 +1492,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
 #endif
 
 	      /* Make sure the other end of the fork is stopped too.  */
-	      child_ptid = fbsd_is_child_pending (child);
-	      if (child_ptid == null_ptid)
-		{
-		  int status;
-
-		  pid = waitpid (child, &status, 0);
-		  if (pid == -1)
-		    perror_with_name (("waitpid"));
-
-		  gdb_assert (pid == child);
-
-		  if (ptrace (PT_LWPINFO, child, (caddr_t) &pl, sizeof pl) == -1)
-		    perror_with_name (("ptrace (PT_LWPINFO)"));
-
-		  gdb_assert (pl.pl_flags & PL_FLAG_CHILD);
-		  child_ptid = ptid_t (child, pl.pl_lwpid);
-		}
+	      child_ptid = fbsd_wait_for_fork_child (child);
 
 	      /* Enable additional events on the child process.  */
 	      fbsd_enable_proc_events (child_ptid.pid ());
@@ -2115,6 +2124,56 @@ fbsd_nat_target::detach (inferior *inf, int from_tty)
   detach_success (inf);
 }
 
+/* Implement the "kill" target method.  */
+
+void
+fbsd_nat_target::kill ()
+{
+  pid_t pid = inferior_ptid.pid ();
+  if (pid == 0)
+    return;
+
+  inferior *inf = current_inferior ();
+  stop_process (inf);
+
+  if (detach_fork_children (this, inf)) {
+    /* No need to kill now.  */
+    target_mourn_inferior (inferior_ptid);
+
+    return;
+  }
+
+#ifdef TDP_RFPPWAIT
+  /* If there are any threads that have forked a new child but not yet
+     reported it because other threads reported events first, detach
+     from the children before killing the parent.  */
+  auto lambda = [] (const struct ptrace_lwpinfo &pl)
+  {
+    if (pl.pl_flags & PL_FLAG_FORKED)
+      {
+	pid_t child = pl.pl_child_pid;
+
+	/* If the child hasn't reported its stop yet, wait for it to
+	   stop.  */
+	fbsd_wait_for_fork_child (child);
+
+	/* Detach from the child.  */
+	(void) ptrace (PT_DETACH, child, (caddr_t) 1, 0);
+      }
+    return false;
+  };
+  iterate_other_ptrace_events (pid, gdb::make_function_view (lambda));
+#endif
+
+  if (ptrace (PT_KILL, pid, NULL, 0) == -1)
+	perror_with_name (("ptrace (PT_KILL)"));
+
+  int status;
+  waitpid (pid, &status, 0);
+
+  target_mourn_inferior (inferior_ptid);
+}
+
 void
 fbsd_nat_target::mourn_inferior ()
 {
diff --git a/gdb/fbsd-nat.h b/gdb/fbsd-nat.h
index 3dc22ce1cca..8096c53bc03 100644
--- a/gdb/fbsd-nat.h
+++ b/gdb/fbsd-nat.h
@@ -80,6 +80,8 @@ class fbsd_nat_target : public inf_ptrace_target
 
   void detach (inferior *, int) override;
 
+  void kill () override;
+
   void mourn_inferior () override;
 
   void resume (ptid_t, int, enum gdb_signal) override;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target
  2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
                   ` (8 preceding siblings ...)
  2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
@ 2023-03-14 18:21 ` John Baldwin
  9 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-03-14 18:21 UTC (permalink / raw)
  To: gdb-patches

On 2/28/23 10:18 AM, John Baldwin wrote:
> When I originally added multiprocess support to the FreeBSD target I
> did not fully understand what that meant.  This series seeks to fix
> several of my incorrect assumptions (e.g. the need to resume multiple
> inferiors for a given call to resume, as well as the need to stop
> other inferiors when reporting an event back from wait).
> 
> This does not add support for non-stop mode, but probably gets closer
> to being able to support it.
> 
> In terms of test results, the before summary of a testsuite run on
> FreeBSD/amd64 is:
> 
>                  === gdb Summary ===
> 
> # of unexpected core files      11
> # of expected passes            98135
> # of unexpected failures        3104
> # of expected failures          58
> # of known failures             125
> # of unresolved testcases       74
> # of untested testcases         58
> # of unsupported tests          818
> # of duplicate test names       10
> 
> and the summary after this series is:
> 
>                  === gdb Summary ===
> 
> # of unexpected core files      9
> # of expected passes            98586
> # of unexpected failures        3002
> # of expected failures          58
> # of known failures             125
> # of unresolved testcases       54
> # of untested testcases         58
> # of unsupported tests          818
> # of duplicate test names       9
> 
> Some of these other failures are improved further by the other
> testsuite patches I posted earlier involving races in tests, or
> matching on the output of 'info threads'.
> 
> I do see a few failures of the new assertions added in this series
> still, namely a handful of tests do something I'm not sure is
> expected: they schedule multiple individual threads from a single
> process multiple resume() calls.  I think though this may be a test
> that is assuming the target supports non-stop when it doesn't.  For
> example, here are the debugging logs from a test that I caught that
> does this:
> 
> Running /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp ...
> ...
> (gdb) file /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
> Reading symbols from /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads...
> (gdb) builtin_spawn /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
> attach 44426
> attach: running_lwps 1
> Attaching to program: /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads, process 44426
> [New LWP 959198 of process 44426]
> [New LWP 959404 of process 44426]
> [New LWP 959532 of process 44426]
> [New LWP 959544 of process 44426]
> [New LWP 959568 of process 44426]
> ...
> [Switching to LWP 432744 of process 44426]
> _nanosleep () at _nanosleep.S:4
> 4       PSEUDO(nanosleep)
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
> info threads
>    Id   Target Id                              Frame
> * 1    LWP 432744 of process 44426            _nanosleep () at _nanosleep.S:4
>    2    LWP 959198 of process 44426 "detached" _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
> ...
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
> set breakpoint always-inserted on
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
> break break_fn
> Breakpoint 1 at 0x400ebc: file /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c, line 57.
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break break_fn
> set debug infrun on
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug infrun on
> set debug fbsd-nat on
> (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug fbsd-nat on
> continue
> Continuing.
> [infrun] clear_proceed_status_thread: 44426.432744.0
> [infrun] clear_proceed_status_thread: 44426.959198.0
> [infrun] clear_proceed_status_thread: 44426.959404.0
> ...
> [infrun] clear_proceed_status_thread: 44426.960473.0
> [infrun] clear_proceed_status_thread: 44426.960474.0
> [infrun] clear_proceed_status_thread: 44426.960475.0
> [infrun] proceed: enter
>    [infrun] user_visible_resume_ptid: resuming all threads of current process
>    [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
>    [infrun] scoped_disable_commit_resumed: reason=proceeding
>    [infrun] start_step_over: enter
>      [infrun] start_step_over: stealing global queue of threads to step, length = 0
>      [infrun] operator(): step-over queue now empty
>    [infrun] start_step_over: exit
>    [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
>      [infrun] proceed: resuming 44426.432744.0
>      [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.432744.0] at 0x823baf64a
>      [infrun] internal_resume_ptid: non_stop uses inferior_ptid
>      [infrun] do_target_resume: resume_ptid=44426.432744.0, step=0, sig=GDB_SIGNAL_0
>      [fbsd-nat] resume: start: [LWP 432744 of process 44426], step 0, signo 0 (0)
>        [fbsd-nat] resume_one_process: [LWP 432744 of process 44426], step 0, signo 0 (0)
>      [fbsd-nat] resume: end: [LWP 432744 of process 44426], step 0, signo 0 (0)
>      [infrun] infrun_async: enable=1
>      [infrun] prepare_to_wait: prepare_to_wait
>      [infrun] proceed: resuming 44426.959198.0
>      [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.959198.0] at 0x821d32ccc
>      [infrun] internal_resume_ptid: non_stop uses inferior_ptid
>      [infrun] do_target_resume: resume_ptid=44426.959198.0, step=0, sig=GDB_SIGNAL_0
>      [fbsd-nat] resume: start: [LWP 959198 of process 44426], step 0, signo 0 (0)
>        [fbsd-nat] resume_one_process: [LWP 959198 of process 44426], step 0, signo 0 (0)
> pid 44426: running_lwps 1
> /home/john/work/git/gdb/gdb/fbsd-nat.c:1199: internal-error: resume_one_process: Assertion `info->running_lwps == 0' failed.
> 
> I've started on a patch to defer actual process resumes until either
> wait or commit_resumed is called, but that has so far caused more
> regressions than fixes, so I'm not including it here.  Also, the
> logged line of 'start: resuming threads, all-stop-on-top-of-non-stop'
> when the target is all-stop only makes me think the test is perhaps
> broken in this case.

Since these patches are all specific to fbsd-nat I can push them, but
I was curious if anyone had any feedback?  In particular, the target
interface is a bit complex and if I have incorrect assumptions in
these patches it would be nice to know if there are things I should
be fixing further.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] fbsd-nat: Add missing spaces.
  2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
@ 2023-03-20 17:43   ` Simon Marchi
  2023-03-20 18:08     ` John Baldwin
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 17:43 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> No functional change, just style fixes.

Approved-By: Simon Marchi <simon.marchi@efficios.com>

But really, feel free to push this kind of patches as obvious.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind.
  2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
@ 2023-03-20 17:45   ` Simon Marchi
  0 siblings, 0 replies; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 17:45 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> This is in #ifdef'd code for a workaround for FreeBSD versions older
> than 11.1 which is why it wasn't caught earlier.
> ---
>  gdb/fbsd-nat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> index 4e48bfa1590..2f5b512fb33 100644
> --- a/gdb/fbsd-nat.c
> +++ b/gdb/fbsd-nat.c
> @@ -1261,7 +1261,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>        wptid = fbsd_next_vfork_done ();
>        if (wptid != null_ptid)
>  	{
> -	  ourstatus->kind = TARGET_WAITKIND_VFORK_DONE;
> +	  ourstatus->set_vfork_done ();
>  	  return wptid;
>  	}
>  #endif
> -- 
> 2.38.1
> 

Approved-By: Simon Marchi <simon.marchi@efficios.com>

I think this could also be considered obvious, and you are the
maintainer of this fail anyway, so it's fine to just push fixes that are
likely not controversial.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig.
  2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
@ 2023-03-20 17:47   ` Simon Marchi
  0 siblings, 0 replies; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 17:47 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> Use GDB_SIGNAL_TRAP instead of SIGTRAP.  This is a no-op since the
> value of SIGTRAP on FreeBSD matches the value of GDB_SIGNAL_TRAP, but
> it is more correct.
> ---
>  gdb/fbsd-nat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> index 2f5b512fb33..04d67fc5278 100644
> --- a/gdb/fbsd-nat.c
> +++ b/gdb/fbsd-nat.c
> @@ -1439,7 +1439,7 @@ fbsd_nat_target::wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>  	     SIGTRAP, so only treat SIGTRAP events as system call
>  	     entry/exit events.  */
>  	  if (pl.pl_flags & (PL_FLAG_SCE | PL_FLAG_SCX)
> -	      && ourstatus->sig () == SIGTRAP)
> +	      && ourstatus->sig () == GDB_SIGNAL_TRAP)
>  	    {
>  #ifdef HAVE_STRUCT_PTRACE_LWPINFO_PL_SYSCALL_CODE
>  	      if (catch_syscall_enabled ())
> -- 
> 2.38.1
> 

Approved-By: Simon Marchi <simon.marchi@efficios.com>

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] fbsd-nat: Add missing spaces.
  2023-03-20 17:43   ` Simon Marchi
@ 2023-03-20 18:08     ` John Baldwin
  0 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-03-20 18:08 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 3/20/23 10:43 AM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> No functional change, just style fixes.
> 
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
> 
> But really, feel free to push this kind of patches as obvious.

Oh, yes, sorry, I almost did.  I had been working on this series
for a while and had pulled these out to the front of the series when
cleaning it up for review.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] fbsd-nat: Add a list of pending events.
  2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
@ 2023-03-20 18:35   ` Simon Marchi
  2023-03-27 20:24     ` John Baldwin
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 18:35 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> The pending_events list stores a queue of deferred events that might
> be reported by the next call to the target's wait method.  The set of
> events that are eligible is filtered by the ptid passed to resume.
> 
> For now this just replaces the list of vfork_done events.  A
> subsequent commit will reuse this to store other events.
> ---
>  gdb/fbsd-nat.c | 146 ++++++++++++++++++++++++++++++++-----------------
>  gdb/fbsd-nat.h |   2 +
>  2 files changed, 98 insertions(+), 50 deletions(-)
> 
> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> index 04d67fc5278..ca278b871ef 100644
> --- a/gdb/fbsd-nat.c
> +++ b/gdb/fbsd-nat.c
> @@ -54,6 +54,66 @@
>  #define	PT_SETREGSET	43	/* Set a target register set */
>  #endif
>  
> +/* Filter for ptid's allowed to report events from wait.  Normally set
> +   in resume, but also reset to minus_one_ptid in create_inferior and
> +   attach.  */
> +
> +static ptid_t resume_ptid;

It doesn't change anything conceptually, but I think this should be a
private field of fbsd_nat_target.

This resume_ptid approach might work now, but it won't work in general,
since it only records the scope ptid passed to the last call to resume.

Even today, do you support multi-inferior?  If the user does:

(gdb) inferior 1
(gdb) continue &
(gdb) inferior 2
(gdb) continue &

Then resume_ptid would be (inf2_pid, 0, 0), and you'd be ignoring any
event from inferior 1, right?

I think the general solution is necessarily to remember the
resume state for each individual thread, somehow.  Up to you to see if
you want to implement that now or defer to later.  If you only support
all-stop with a single inferior, it might be enough for now.

For reference, the linux-nat target maintains its own data structures of
lwps (lwp_lwpid_htab), in which the last_resume_kind field is used for
that.  It would be unfortunate to have to reimplement all of this for
fbsd-nat as I'm sure that a lot of the logic would be very similar.  But
then making the code shareable would also be a lot of work.

> +
> +/* If an event is triggered asynchronously (fake vfork_done events) or
> +   occurs when the core is not expecting it, a pending event is
> +   created.  This event is then returned by a future call to the
> +   target wait method.  */

It can probably also happen if two events happen at the "same" time?
Like if you react to a breakpoint hit by one thread, and while stopping
the other threads, another thread reports a breakpoint stop, you have to
store that one as a pending event?

> +
> +struct pending_event
> +{
> +  pending_event (const ptid_t &_ptid, const target_waitstatus &_status) :
> +    ptid(_ptid), status(_status) {}

Spaces after parentheses.

> +
> +  ptid_t ptid;
> +  target_waitstatus status;
> +};
> +
> +static std::list<pending_event> pending_events;

This could use intrusive_list, which is a bit more lightweight than
std::list (it won't really make a difference in practice, but we have
it, so we might as well use it).

> +
> +/* Add a new pending event to the list.  */
> +
> +static void
> +add_pending_event (const ptid_t &ptid, const target_waitstatus &status)
> +{
> +  pending_events.emplace_back (ptid, status);
> +}
> +
> +/* Return true if there is a pending event matching FILTER.  */
> +
> +static bool
> +have_pending_event (ptid_t filter)
> +{
> +  for (const pending_event &event : pending_events)
> +    if (event.ptid.matches (filter))
> +      return true;
> +  return false;
> +}
> +
> +/* Helper method called by the target wait method.  Returns true if
> +   there is a pending event matching resume_ptid.  If there is a

resume_ptid -> RESUME_PTID

> +   matching event, PTID and *STATUS contain the event details, and the
> +   event is removed from the pending list.  */
> +
> +static bool
> +take_pending_event (ptid_t &ptid, target_waitstatus *status)
> +{
> +  for (auto it = pending_events.begin (); it != pending_events.end (); it++)
> +    if (it->ptid.matches (resume_ptid))
> +      {
> +	ptid = it->ptid;
> +	*status = it->status;
> +	pending_events.erase (it);
> +	return true;
> +      }
> +  return false;
> +}

This could maybe return a gdb::optional<pending_event>?

> +
>  /* Return the name of a file that can be opened to get the symbols for
>     the child process identified by PID.  */
>  
> @@ -1061,47 +1121,20 @@ fbsd_is_child_pending (pid_t pid)
>  }
>  
>  #ifndef PTRACE_VFORK
> -static std::forward_list<ptid_t> fbsd_pending_vfork_done;
> -
>  /* Record a pending vfork done event.  */
>  
>  static void
>  fbsd_add_vfork_done (ptid_t pid)
>  {
> -  fbsd_pending_vfork_done.push_front (pid);
> +  target_waitstatus status;
> +  status.set_vfork_done ();
> +  add_pending_event (ptid, status);

I think you can do this?

  add_pending_event (ptid, target_waitstatus ().set_vfork_done ());

Not a C++ expert, but I believe the temporary target_waitstatus is kept
alive long enough for this to work.

> @@ -1110,21 +1143,18 @@ fbsd_next_vfork_done (void)
>  void
>  fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)

It would be a good time to rename the first parameter from ptid to
scope_ptid, to match what was done here:

https://gitlab.com/gnutools/binutils-gdb/-/commit/d51926f06a7f4854bebdd71dcb0a78dbaa2f4168

I think it helps understand the meaning of the parameter.

>  {
> -#if defined(TDP_RFPPWAIT) && !defined(PTRACE_VFORK)
> -  pid_t pid;
> -
> -  /* Don't PT_CONTINUE a process which has a pending vfork done event.  */
> -  if (minus_one_ptid == ptid)
> -    pid = inferior_ptid.pid ();
> -  else
> -    pid = ptid.pid ();
> -  if (fbsd_is_vfork_done_pending (pid))
> -    return;
> -#endif

Yeah, this old code seems to misinterpret the target_ops::resume
interface, which was clarified in the commit mentioned above.

> +void
> +fbsd_nat_target::attach (const char *args, int from_tty)
> +{
> +  /* Expect a wait for the new process.  */
> +  resume_ptid = minus_one_ptid;
> +  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
> +			 target_pid_to_str (resume_ptid).c_str ());
> +  inf_ptrace_target::attach (args, from_tty);
> +}
> +

With the "general" solution where you keep rack of the resume state of
all threads, this part would be a bit cleaner I think.  Any thread
created due to an attach or due to create_inferior would start in the
"resumed" state.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait.
  2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
@ 2023-03-20 19:09   ` Simon Marchi
  2023-03-27 20:49     ` John Baldwin
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 19:09 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> If wait_1 finds an event for a thread or process that does not match
> the set of threads and processes previously resumed, defer the event.
> If the event is for a specific thread, suspend the thread and continue
> the associated process before waiting for another event.

I do not understand this, because it doesn't match my mental model of
how things should work (which is derived from working with linux-nat).
In that model, a thread with a pending event should never be
ptrace-resumed.  So the last sentence doesn't make sense, it implies
that a thread has a pending and is ptrace-resumed (since we suspend it).

> One specific example of such an event is if a thread is created while
> another thread in the same process hits a breakpoint.  If the second
> thread's event is reported first, the target resume method does not
> yet "know" about the new thread and will not suspend it via
> PT_SUSPEND.

I was surprised to read that the resume method suspends some threads.  I
don't think it should, but I do see the current code.  The current code
appears to interpret the resume request as: "ensure the state of thread
resumption looks exactly like this", so it suspends threads as needed,
to match the requested resumption state.  I don't think it's supposed to
work like that.  The resume method should only resume some threads, and
if the core of GDB wants some threads stopped, it will call the stop
method.

Anyhow, just to be sure I understand the problem you describe: when
fbsd-nat reports the breakpoint hit from "another thread", is the
"thread created" event still in the kernel, not consumed by GDB?

> When wait is called, it will probably return the event
> from the first thread before the result of the step from second
> thread.  This is the case reported in PR 21497.

> ---
>  gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> index ca278b871ef..3f7278c6ea0 100644
> --- a/gdb/fbsd-nat.c
> +++ b/gdb/fbsd-nat.c
> @@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>    if (is_async_p ())
>      async_file_flush ();
>  
> -  wptid = wait_1 (ptid, ourstatus, target_options);
> +  while (1)
> +    {
> +      wptid = wait_1 (ptid, ourstatus, target_options);
> +
> +      /* If no event was found, just return.  */
> +      if (ourstatus->kind () == TARGET_WAITKIND_IGNORE
> +	  || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
> +	break;
> +
> +      /* If an event is reported for a thread or process while
> +	 stepping some other thread, suspend the thread reporting the
> +	 event and defer the event until it can be reported to the
> +	 core.  */
> +      if (!wptid.matches (resume_ptid))
> +	{
> +	  add_pending_event (wptid, *ourstatus);
> +	  fbsd_nat_debug_printf ("deferring event [%s], [%s]",
> +				 target_pid_to_str (wptid).c_str (),
> +				 ourstatus->to_string ().c_str ());
> +	  if (wptid.pid () == resume_ptid.pid ())
> +	    {
> +	      fbsd_nat_debug_printf ("suspending thread [%s]",
> +				     target_pid_to_str (wptid).c_str ());
> +	      if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
> +		perror_with_name (("ptrace (PT_SUSPEND)"));

This is the part I don't understand.  After wait_1 returned an event for
a specific thread, I would expect this thread to be ptrace-stopped.  So,
I don't see the need to suspend it.  You'd just leave it in the "resumed
from the
until.

> +	      if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
> +		perror_with_name (("ptrace (PT_CONTINUE)"));

I don't get why you continue `wptid.pid ()`.  What assures you that this
particular thread was stopped previously?  Perhaps I don't understand
because I assume somethings about pids/lwps and ptrace that are only
true on Linux.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes.
  2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
@ 2023-03-20 19:55   ` Simon Marchi
  2023-03-27 21:13     ` John Baldwin
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2023-03-20 19:55 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 2/28/23 13:18, John Baldwin wrote:
> I did not fully understand the requirements of multiple process
> support when I enabled it previously and several parts were broken.
> In particular, the resume method was only resuming a single process,
> and wait was not stopping other processes when reporting an event.
> 
> To support multiple running inferiors, add a new per-inferior
> structure which trackes the number of existing and running LWPs for
> each process.  The structure also stores a ptid_t describing the
> set of LWPs currently resumed for each process.

Ah, that sounds good, related to my comments on previous patches about
tracking the resume state for each LWP.  I don't know if it needs to be
per-inferior though, versus a global table like Linux does (but perhaps
it helps, I'll see when reading the code).

> 
> For the resume method, iterate over all non-exited inferiors resuming
> each process matching the passed in ptid rather than only resuming the
> current inferior's process for a wildcard ptid.  If a resumed process
> has a pending event, don't actually resume the process, but other
> matching processes without a pending event are still resumed in case
> the later call to the wait method requests an event from one of the
> processes without a pending event.
> 
> For the wait method, stop other running processes before returning an
> event to the core.  When stopping a process, first check to see if an
> event is already pending.  If it is, queue the event to be reported
> later.  If not, send a SIGSTOP to the process and wait for it to stop.
> If the event reported by the wait is not for the SIGSTOP, queue the
> event and remember to ignore a future SIGSTOP event for the process.
> 
> Note that, unlike the Linux native target, entire processes are
> stopped rather than individual LWPs.  In FreeBSD one can only wait on
> processes (via pid), not for an event from a specific thread.
> 
> Other changes in this commit handle bookkeeping for the per-inferior
> data such as purging the data in the mourn_inferior method and
> migrating the data to the new inferior in the follow_exec method.  The
> per-inferior data is created in the attach, create_inferior, and
> follow_fork methods.
> ---
>  gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------
>  gdb/fbsd-nat.h |   8 +
>  2 files changed, 317 insertions(+), 94 deletions(-)
> 
> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> index 3f7278c6ea0..14b31ddd86e 100644
> --- a/gdb/fbsd-nat.c
> +++ b/gdb/fbsd-nat.c
> @@ -54,11 +54,26 @@
>  #define	PT_SETREGSET	43	/* Set a target register set */
>  #endif
>  
> -/* Filter for ptid's allowed to report events from wait.  Normally set
> -   in resume, but also reset to minus_one_ptid in create_inferior and
> -   attach.  */
> +/* Information stored about each inferior.  */
>  
> -static ptid_t resume_ptid;
> +struct fbsd_inferior_info
> +{
> +  /* Filter for resumed LWPs which can report events from wait.  */
> +  ptid_t resumed_lwps = null_ptid;
> +
> +  /* Number of LWPs this process contains.  */
> +  unsigned int num_lwps = 0;
> +
> +  /* Number of LWPs currently running.  */
> +  unsigned int running_lwps = 0;
> +
> +  /* Have a pending SIGSTOP event that needs to be discarded.  */
> +  bool pending_sigstop = false;
> +};

Ok, it's not exactly what I expected, but I will keep on reading.

Long term, I don't think the resumed_lwps field will be enough to
describe the resume state of individual threads.  Actually, to
complement the example I gave on an earlier patch, I guess you could do
that today?

(gdb) set scheduler-locking on
(gdb) thread 1
(gdb) continue & # only supposed to resume thread 1
(gdb) thread 2
(gdb) continue & # only supposed to resume thread 2

Here, resume_lwps would end up as (pid, thread_2_lwp, 0), right?


> +
> +/* Per-inferior data key.  */
> +
> +static const registry<inferior>::key<fbsd_inferior_info> fbsd_inferior_data;
>  
>  /* If an event is triggered asynchronously (fake vfork_done events) or
>     occurs when the core is not expecting it, a pending event is
> @@ -95,21 +110,27 @@ have_pending_event (ptid_t filter)
>    return false;
>  }
>  
> -/* Helper method called by the target wait method.  Returns true if
> -   there is a pending event matching resume_ptid.  If there is a
> -   matching event, PTID and *STATUS contain the event details, and the
> -   event is removed from the pending list.  */
> +/* Returns true if there is a pending event for a resumed process
> +   matching FILTER.  If there is a matching event, PTID and *STATUS
> +   contain the event details, and the event is removed from the
> +   pending list.  */
>  
>  static bool
> -take_pending_event (ptid_t &ptid, target_waitstatus *status)
> +take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid,
> +		    target_waitstatus *status)
>  {
>    for (auto it = pending_events.begin (); it != pending_events.end (); it++)
> -    if (it->ptid.matches (resume_ptid))
> +    if (it->ptid.matches (filter))
>        {
> -	ptid = it->ptid;
> -	*status = it->status;
> -	pending_events.erase (it);
> -	return true;
> +	inferior *inf = find_inferior_ptid (target, it->ptid);
> +	fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
> +	if (it->ptid.matches (info->resumed_lwps))
> +	  {
> +	    ptid = it->ptid;
> +	    *status = it->status;
> +	    pending_events.erase (it);
> +	    return true;
> +	  }

If that code was kept as-is, I think take_pending_event should be a
method of fbsd_nat_target, rather than passing it manually.

>        }
>    return false;
>  }
> @@ -799,6 +820,8 @@ show_fbsd_nat_debug (struct ui_file *file, int from_tty,
>  #define fbsd_nat_debug_printf(fmt, ...) \
>    debug_prefixed_printf_cond (debug_fbsd_nat, "fbsd-nat", fmt, ##__VA_ARGS__)
>  
> +#define fbsd_nat_debug_start_end(fmt, ...) \
> +  scoped_debug_start_end (debug_fbsd_nat, "fbsd-nat", fmt, ##__VA_ARGS__)
>  
>  /*
>    FreeBSD's first thread support was via a "reentrant" version of libc
> @@ -956,6 +979,9 @@ fbsd_add_threads (fbsd_nat_target *target, pid_t pid)
>    if (nlwps == -1)
>      perror_with_name (("ptrace (PT_GETLWPLIST)"));
>  
> +  inferior *inf = find_inferior_ptid (target, ptid_t (pid));
> +  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
> +  gdb_assert (info != nullptr);
>    for (i = 0; i < nlwps; i++)
>      {
>        ptid_t ptid = ptid_t (pid, lwps[i]);
> @@ -974,8 +1000,14 @@ fbsd_add_threads (fbsd_nat_target *target, pid_t pid)
>  #endif
>  	  fbsd_lwp_debug_printf ("adding thread for LWP %u", lwps[i]);
>  	  add_thread (target, ptid);
> +#ifdef PT_LWP_EVENTS
> +	  info->num_lwps++;
> +#endif
>  	}
>      }
> +#ifndef PT_LWP_EVENTS
> +  info->num_lwps = nlwps;
> +#endif
>  }
>  
>  /* Implement the "update_thread_list" target_ops method.  */
> @@ -1138,92 +1170,117 @@ fbsd_add_vfork_done (ptid_t pid)
>  #endif
>  #endif
>  
> -/* Implement the "resume" target_ops method.  */
> +/* Resume a single process.  */
>  
>  void
> -fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
> +fbsd_nat_target::resume_one_process (ptid_t ptid, int step,
> +				     enum gdb_signal signo)
>  {
>    fbsd_nat_debug_printf ("[%s], step %d, signo %d (%s)",
>  			 target_pid_to_str (ptid).c_str (), step, signo,
>  			 gdb_signal_to_name (signo));
>  
> +  inferior *inf = find_inferior_ptid (this, ptid);
> +  fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
> +  info->resumed_lwps = ptid;
> +  gdb_assert (info->running_lwps == 0);
> +
>    /* Don't PT_CONTINUE a thread or process which has a pending event.  */
> -  resume_ptid = ptid;
>    if (have_pending_event (ptid))
>      {
>        fbsd_nat_debug_printf ("found pending event");
>        return;
>      }
>  
> -  if (ptid.lwp_p ())
> +  for (thread_info *tp : inf->non_exited_threads ())
>      {
> -      /* If ptid is a specific LWP, suspend all other LWPs in the process.  */


> -      inferior *inf = find_inferior_ptid (this, ptid);
> +      int request;
>  
> -      for (thread_info *tp : inf->non_exited_threads ())
> -	{
> -	  int request;
> -
> -	  if (tp->ptid.lwp () == ptid.lwp ())
> -	    request = PT_RESUME;
> -	  else
> -	    request = PT_SUSPEND;
> -
> -	  if (ptrace (request, tp->ptid.lwp (), NULL, 0) == -1)
> -	    perror_with_name (request == PT_RESUME ?
> -			      ("ptrace (PT_RESUME)") :
> -			      ("ptrace (PT_SUSPEND)"));
> -	  if (request == PT_RESUME)
> -	    low_prepare_to_resume (tp);
> -	}
> -    }
> -  else
> -    {
> -      /* If ptid is a wildcard, resume all matching threads (they won't run
> -	 until the process is continued however).  */
> -      for (thread_info *tp : all_non_exited_threads (this, ptid))
> +      /* If ptid is a specific LWP, suspend all other LWPs in the
> +	 process, otherwise resume all LWPs in the process..  */

As mentioned in a previous message, I don't think this is how the resume
method is supposed to work.

> +      if (!ptid.lwp_p() || tp->ptid.lwp () == ptid.lwp ())
>  	{
>  	  if (ptrace (PT_RESUME, tp->ptid.lwp (), NULL, 0) == -1)
>  	    perror_with_name (("ptrace (PT_RESUME)"));
>  	  low_prepare_to_resume (tp);
> +	  info->running_lwps++;
>  	}
> +      else
> +	{
> +	  if (ptrace (PT_SUSPEND, tp->ptid.lwp (), NULL, 0) == -1)
> +	    perror_with_name (("ptrace (PT_SUSPEND)"));
> +	}
> +    }
> +
> +  if (ptid.pid () != inferior_ptid.pid ())
> +    {
> +      step = 0;
> +      signo = GDB_SIGNAL_0;
> +      gdb_assert (!ptid.lwp_p ());

I don't get why you ignore the step request here.  Perhaps it is due to
a misundertanding of the resume interface (which was really confusing
before Pedro's clarification)?

The comment on target_resume and the commit message on Pedro's change
explain it well, but essentially:

 - The ptid passed as a parameter (SCOPE_PTID) is the set of threads to
   resume.  If you keep a list of LWPs, then you can just go over that
   list and resume everything that ptid_t::matches SCOPE_PTID, and that
   doesn't have a pending event.
 - STEP indicates whether to resume INFERIOR_PTID in resume mode.  If
   STEP is 0, it means that no thread is resumed in step mode they are
   all resumed normally.
 - SIGNAL indicates what signal to resume INFERIOR_PTID with.  If
   it's GDB_SIGNAL_0, it means resume without a signal.

So, the last two bullets only modify how the thread identified by
INFERIOR_PTID is resumed.  I think that in practice, it's guaranteed in
practice today that INFERIOR_PTID is "within" SCOPE_PTID.  But you can
also write the code without that assumption, it should be much harder.

For threads that are not INFERIOR_PTID, I think the target should
resume them with the signal in thread_info::m_suspend::stop_signal.  But
that can be a problem for another day.

I'll wait for clarifications from you before continuing to read, because
I am a bit lost with the approach taken here.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] fbsd-nat: Add a list of pending events.
  2023-03-20 18:35   ` Simon Marchi
@ 2023-03-27 20:24     ` John Baldwin
  2023-03-27 20:57       ` Simon Marchi
  2023-03-27 21:00       ` John Baldwin
  0 siblings, 2 replies; 24+ messages in thread
From: John Baldwin @ 2023-03-27 20:24 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 3/20/23 11:35 AM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> The pending_events list stores a queue of deferred events that might
>> be reported by the next call to the target's wait method.  The set of
>> events that are eligible is filtered by the ptid passed to resume.
>>
>> For now this just replaces the list of vfork_done events.  A
>> subsequent commit will reuse this to store other events.
>> ---
>>   gdb/fbsd-nat.c | 146 ++++++++++++++++++++++++++++++++-----------------
>>   gdb/fbsd-nat.h |   2 +
>>   2 files changed, 98 insertions(+), 50 deletions(-)
>>
>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>> index 04d67fc5278..ca278b871ef 100644
>> --- a/gdb/fbsd-nat.c
>> +++ b/gdb/fbsd-nat.c
>> @@ -54,6 +54,66 @@
>>   #define	PT_SETREGSET	43	/* Set a target register set */
>>   #endif
>>   
>> +/* Filter for ptid's allowed to report events from wait.  Normally set
>> +   in resume, but also reset to minus_one_ptid in create_inferior and
>> +   attach.  */
>> +
>> +static ptid_t resume_ptid;
> 
> It doesn't change anything conceptually, but I think this should be a
> private field of fbsd_nat_target.

This does mean making all the *pending_event functions private methods
of the class or using friend in various ways.  It also doesn't exist
at the end of the series.

> This resume_ptid approach might work now, but it won't work in general,
> since it only records the scope ptid passed to the last call to resume.
> 
> Even today, do you support multi-inferior?  If the user does:
> 
> (gdb) inferior 1
> (gdb) continue &
> (gdb) inferior 2
> (gdb) continue &
> 
> Then resume_ptid would be (inf2_pid, 0, 0), and you'd be ignoring any
> event from inferior 1, right?
> 
> I think the general solution is necessarily to remember the
> resume state for each individual thread, somehow.  Up to you to see if
> you want to implement that now or defer to later.  If you only support
> all-stop with a single inferior, it might be enough for now.
> 
> For reference, the linux-nat target maintains its own data structures of
> lwps (lwp_lwpid_htab), in which the last_resume_kind field is used for
> that.  It would be unfortunate to have to reimplement all of this for
> fbsd-nat as I'm sure that a lot of the logic would be very similar.  But
> then making the code shareable would also be a lot of work.

Yes, patch 6 reworks this further to properly suport multi-process where
it tracks this type of state per-process instead of globally.  I could
squash them all together perhaps but was trying to incrementally solve
multiple problems and trying to keep the patches smaller to ease review.
In particular, patch 5 aims to solve the problem of a new thread showing
up in a process being single-stepped where events are only expected for
the thread being stepped.

>> +
>> +/* If an event is triggered asynchronously (fake vfork_done events) or
>> +   occurs when the core is not expecting it, a pending event is
>> +   created.  This event is then returned by a future call to the
>> +   target wait method.  */
> 
> It can probably also happen if two events happen at the "same" time?
> Like if you react to a breakpoint hit by one thread, and while stopping
> the other threads, another thread reports a breakpoint stop, you have to
> store that one as a pending event?

With the caveat that on FreeBSD (and I think on other BSDs) you'd have to
have those threads be in separate processes, but yes.  This patch aims to
try to make the pending vfork events list more generic so it can be used
to hold other events.
>> +
>> +struct pending_event
>> +{
>> +  pending_event (const ptid_t &_ptid, const target_waitstatus &_status) :
>> +    ptid(_ptid), status(_status) {}
> 
> Spaces after parentheses.

Fixed (along with other style bugs you pointed out)

>> +
>> +  ptid_t ptid;
>> +  target_waitstatus status;
>> +};
>> +
>> +static std::list<pending_event> pending_events;
> 
> This could use intrusive_list, which is a bit more lightweight than
> std::list (it won't really make a difference in practice, but we have
> it, so we might as well use it).

Note though that this is a list of objects, not a list of pointers,
so as a std::list the objects are embedded in a single allocation
along with the pointers (IIUC) (e.g. the current code uses emplace_back
to construct the object in place in the list node).  In order to use an
intrusive list, the code would now have to explicitly new an object
before doing a push_back, and take_pending_event would have to explicitly
delete the object after the call to erase().

>> +   matching event, PTID and *STATUS contain the event details, and the
>> +   event is removed from the pending list.  */
>> +
>> +static bool
>> +take_pending_event (ptid_t &ptid, target_waitstatus *status)
>> +{
>> +  for (auto it = pending_events.begin (); it != pending_events.end (); it++)
>> +    if (it->ptid.matches (resume_ptid))
>> +      {
>> +	ptid = it->ptid;
>> +	*status = it->status;
>> +	pending_events.erase (it);
>> +	return true;
>> +      }
>> +  return false;
>> +}
> 
> This could maybe return a gdb::optional<pending_event>?

Ok.
>> @@ -1061,47 +1121,20 @@ fbsd_is_child_pending (pid_t pid)
>>   }
>>   
>>   #ifndef PTRACE_VFORK
>> -static std::forward_list<ptid_t> fbsd_pending_vfork_done;
>> -
>>   /* Record a pending vfork done event.  */
>>   
>>   static void
>>   fbsd_add_vfork_done (ptid_t pid)
>>   {
>> -  fbsd_pending_vfork_done.push_front (pid);
>> +  target_waitstatus status;
>> +  status.set_vfork_done ();
>> +  add_pending_event (ptid, status);
> 
> I think you can do this?
> 
>    add_pending_event (ptid, target_waitstatus ().set_vfork_done ());
> 
> Not a C++ expert, but I believe the temporary target_waitstatus is kept
> alive long enough for this to work.

Yes, I think this is fine.
  
>> @@ -1110,21 +1143,18 @@ fbsd_next_vfork_done (void)
>>   void
>>   fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
> 
> It would be a good time to rename the first parameter from ptid to
> scope_ptid, to match what was done here:
> 
> https://gitlab.com/gnutools/binutils-gdb/-/commit/d51926f06a7f4854bebdd71dcb0a78dbaa2f4168
> 
> I think it helps understand the meaning of the parameter.

Hmm, I might do the rename as a separate commit first to avoid adding noise
in this patch.

>>   {
>> -#if defined(TDP_RFPPWAIT) && !defined(PTRACE_VFORK)
>> -  pid_t pid;
>> -
>> -  /* Don't PT_CONTINUE a process which has a pending vfork done event.  */
>> -  if (minus_one_ptid == ptid)
>> -    pid = inferior_ptid.pid ();
>> -  else
>> -    pid = ptid.pid ();
>> -  if (fbsd_is_vfork_done_pending (pid))
>> -    return;
>> -#endif
> 
> Yeah, this old code seems to misinterpret the target_ops::resume
> interface, which was clarified in the commit mentioned above.
> 
>> +void
>> +fbsd_nat_target::attach (const char *args, int from_tty)
>> +{
>> +  /* Expect a wait for the new process.  */
>> +  resume_ptid = minus_one_ptid;
>> +  fbsd_nat_debug_printf ("setting resume_ptid to [%s]",
>> +			 target_pid_to_str (resume_ptid).c_str ());
>> +  inf_ptrace_target::attach (args, from_tty);
>> +}
>> +
> 
> With the "general" solution where you keep rack of the resume state of
> all threads, this part would be a bit cleaner I think.  Any thread
> created due to an attach or due to create_inferior would start in the
> "resumed" state.

So later on in patch 6 what happened is that I ended up with a
per-process structure that tracked the state of an inferior and
resume_ptid is per-inferior instead of global.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait.
  2023-03-20 19:09   ` Simon Marchi
@ 2023-03-27 20:49     ` John Baldwin
  2023-03-27 21:04       ` Simon Marchi
  0 siblings, 1 reply; 24+ messages in thread
From: John Baldwin @ 2023-03-27 20:49 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 3/20/23 12:09 PM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> If wait_1 finds an event for a thread or process that does not match
>> the set of threads and processes previously resumed, defer the event.
>> If the event is for a specific thread, suspend the thread and continue
>> the associated process before waiting for another event.
> 
> I do not understand this, because it doesn't match my mental model of
> how things should work (which is derived from working with linux-nat).
> In that model, a thread with a pending event should never be
> ptrace-resumed.  So the last sentence doesn't make sense, it implies
> that a thread has a pending and is ptrace-resumed (since we suspend it).

The difference here might be that on FreeBSD ptrace can only PT_CONTINUE
and wait() for entire processes.  When a thread contains multiple threads,
PT_CONTINUE resumes all threads belonging to that process.  Individual
threads can be controlled by using PT_SUSPEND on a specific LWP.  Those
LWPs will stay asleep after PT_CONTINUE.  If any thread encounters a
stop event (breakpoint, fork event, etc.) the entire process stops (the
kernel signals threads in userland if needed to get them back into the
kernel to stop).  Once it is stopped it reports an event via wait().

Currently on FreeBSD wait() only reports a single event at once.  So
if multiple threads hit a breakpoint, wait() returns for the first one,
PT_LWPINFO is used on the pid to find out what specific event happened
(and which LWP it happened for).  The other thread will keep its event
pending in the kernel, and after PT_CONTINUE it will immediately trigger
another stop and an event reported from wait().

>> One specific example of such an event is if a thread is created while
>> another thread in the same process hits a breakpoint.  If the second
>> thread's event is reported first, the target resume method does not
>> yet "know" about the new thread and will not suspend it via
>> PT_SUSPEND.
> 
> I was surprised to read that the resume method suspends some threads.  I
> don't think it should, but I do see the current code.  The current code
> appears to interpret the resume request as: "ensure the state of thread
> resumption looks exactly like this", so it suspends threads as needed,
> to match the requested resumption state.  I don't think it's supposed to
> work like that.  The resume method should only resume some threads, and
> if the core of GDB wants some threads stopped, it will call the stop
> method.

See above.  I can't individually suspend/resume threads.  I can only
resume processes, but decide which set of threads within a process
to keep resumed when resuming the entire process.

> Anyhow, just to be sure I understand the problem you describe: when
> fbsd-nat reports the breakpoint hit from "another thread", is the
> "thread created" event still in the kernel, not consumed by GDB?

Yes.  What happens is that GDB tries to single-step a thread over a
breakpoint, so it does a resume() of a single ptid_t.  To implement
this, fbsd-nat's resume has to explicitly PT_SUSPEND all the other
LWPs in the process before doing a PT_CONTINUE so that only the
stepped thread will execute.  However, this logic doesn't yet know
about the new thread (and it hasn't started executing in userland
yet).  However, once the new thread starts executing, it immediately
reports a ptrace event.  This confuses GDB currently becuase that
ptrace event can be reported right after the PT_CONTINUE.  I think
in part this is because unlike on Linux, when thread A creates thread
B, thread A doesn't report anything.  Instead, only thread B reports
an event when it starts.  On Linux thread creation is more like fork
where thread A reports an event saying it created B.
  
>> When wait is called, it will probably return the event
>> from the first thread before the result of the step from second
>> thread.  This is the case reported in PR 21497.
> 
>> ---
>>   gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>> index ca278b871ef..3f7278c6ea0 100644
>> --- a/gdb/fbsd-nat.c
>> +++ b/gdb/fbsd-nat.c
>> @@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>>     if (is_async_p ())
>>       async_file_flush ();
>>   
>> -  wptid = wait_1 (ptid, ourstatus, target_options);
>> +  while (1)
>> +    {
>> +      wptid = wait_1 (ptid, ourstatus, target_options);
>> +
>> +      /* If no event was found, just return.  */
>> +      if (ourstatus->kind () == TARGET_WAITKIND_IGNORE
>> +	  || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
>> +	break;
>> +
>> +      /* If an event is reported for a thread or process while
>> +	 stepping some other thread, suspend the thread reporting the
>> +	 event and defer the event until it can be reported to the
>> +	 core.  */
>> +      if (!wptid.matches (resume_ptid))
>> +	{
>> +	  add_pending_event (wptid, *ourstatus);
>> +	  fbsd_nat_debug_printf ("deferring event [%s], [%s]",
>> +				 target_pid_to_str (wptid).c_str (),
>> +				 ourstatus->to_string ().c_str ());
>> +	  if (wptid.pid () == resume_ptid.pid ())
>> +	    {
>> +	      fbsd_nat_debug_printf ("suspending thread [%s]",
>> +				     target_pid_to_str (wptid).c_str ());
>> +	      if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
>> +		perror_with_name (("ptrace (PT_SUSPEND)"));
> 
> 
> This is the part I don't understand.  After wait_1 returned an event for
> a specific thread, I would expect this thread to be ptrace-stopped.  So,
> I don't see the need to suspend it.  You'd just leave it in the "resumed
> from the
> until.
> 
>> +	      if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
>> +		perror_with_name (("ptrace (PT_CONTINUE)"));
> 
> I don't get why you continue `wptid.pid ()`.  What assures you that this
> particular thread was stopped previously?  Perhaps I don't understand
> because I assume somethings about pids/lwps and ptrace that are only
> true on Linux.

In this specific case, here is the sequence of events:

- process P1 contains two threads, T1, and T2
- T1 creates a new thread T3, but T3 isn't yet scheduled to execute
- T2 hits a breakpoint so the process stops
- wait() returns P1
- PT_LWPINFO reports the breakpoint event for T2
- GDB undoes the breakpoint and tries to single-step T2 over the
   breakpoint
- GDB calls resume with step=1 and scope_ptid=ptid(P1, T2)
- fbsd_nat::resume iterates over the threads it knows about, but
   it only knows about T1 and T2
   - PT_SUSPEND T1
   - PT_RESUME T2
- the P1 process is now resumed via PT_CONTINUE
- T3 starts executing and reports its "born" event
- wait() returns P1
- PT_LWPINFO reports the thread birth event for T3

Previously at this point fbsd_nat::wait() returned an event for T3
which triggers the assertion failure in the PR.  With this patch
what happens now is:

- fbsd_nat::wait() realizes T3 isn't "resumed" from the core's
   perspective so explicitly suspends it via PT_SUSPEND
- the P1 process is continued via PT_CONTINUE
- T2 completes its step and reports the event
- wait() returns P1
- PT_LWPINFO reports the step-complete event for P2
- the event for T2 is returned from fbsd_nat::wait() to the core

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] fbsd-nat: Add a list of pending events.
  2023-03-27 20:24     ` John Baldwin
@ 2023-03-27 20:57       ` Simon Marchi
  2023-03-27 21:00       ` John Baldwin
  1 sibling, 0 replies; 24+ messages in thread
From: Simon Marchi @ 2023-03-27 20:57 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 3/27/23 16:24, John Baldwin wrote:
> On 3/20/23 11:35 AM, Simon Marchi wrote:
>> On 2/28/23 13:18, John Baldwin wrote:
>>> The pending_events list stores a queue of deferred events that might
>>> be reported by the next call to the target's wait method.  The set of
>>> events that are eligible is filtered by the ptid passed to resume.
>>>
>>> For now this just replaces the list of vfork_done events.  A
>>> subsequent commit will reuse this to store other events.
>>> ---
>>>   gdb/fbsd-nat.c | 146 ++++++++++++++++++++++++++++++++-----------------
>>>   gdb/fbsd-nat.h |   2 +
>>>   2 files changed, 98 insertions(+), 50 deletions(-)
>>>
>>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>>> index 04d67fc5278..ca278b871ef 100644
>>> --- a/gdb/fbsd-nat.c
>>> +++ b/gdb/fbsd-nat.c
>>> @@ -54,6 +54,66 @@
>>>   #define    PT_SETREGSET    43    /* Set a target register set */
>>>   #endif
>>>   +/* Filter for ptid's allowed to report events from wait.  Normally set
>>> +   in resume, but also reset to minus_one_ptid in create_inferior and
>>> +   attach.  */
>>> +
>>> +static ptid_t resume_ptid;
>>
>> It doesn't change anything conceptually, but I think this should be a
>> private field of fbsd_nat_target.
> 
> This does mean making all the *pending_event functions private methods
> of the class or using friend in various ways.  It also doesn't exist
> at the end of the series.

Ok that's fine with me.

>> This resume_ptid approach might work now, but it won't work in general,
>> since it only records the scope ptid passed to the last call to resume.
>>
>> Even today, do you support multi-inferior?  If the user does:
>>
>> (gdb) inferior 1
>> (gdb) continue &
>> (gdb) inferior 2
>> (gdb) continue &
>>
>> Then resume_ptid would be (inf2_pid, 0, 0), and you'd be ignoring any
>> event from inferior 1, right?
>>
>> I think the general solution is necessarily to remember the
>> resume state for each individual thread, somehow.  Up to you to see if
>> you want to implement that now or defer to later.  If you only support
>> all-stop with a single inferior, it might be enough for now.
>>
>> For reference, the linux-nat target maintains its own data structures of
>> lwps (lwp_lwpid_htab), in which the last_resume_kind field is used for
>> that.  It would be unfortunate to have to reimplement all of this for
>> fbsd-nat as I'm sure that a lot of the logic would be very similar.  But
>> then making the code shareable would also be a lot of work.
> 
> Yes, patch 6 reworks this further to properly suport multi-process where
> it tracks this type of state per-process instead of globally.  I could
> squash them all together perhaps but was trying to incrementally solve
> multiple problems and trying to keep the patches smaller to ease review.
> In particular, patch 5 aims to solve the problem of a new thread showing
> up in a process being single-stepped where events are only expected for
> the thread being stepped.

That's ok.  I review patches linearly, making comments as I go.  I don't
have the mental swap space to look at everything and then comment based
on the whole series (even though it would be better).  So it's
appropriate to tell me "it's irrelevant, it changes later".

> Note though that this is a list of objects, not a list of pointers,
> so as a std::list the objects are embedded in a single allocation
> along with the pointers (IIUC) (e.g. the current code uses emplace_back
> to construct the object in place in the list node).  In order to use an
> intrusive list, the code would now have to explicitly new an object
> before doing a push_back, and take_pending_event would have to explicitly
> delete the object after the call to erase().

Indeed.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] fbsd-nat: Add a list of pending events.
  2023-03-27 20:24     ` John Baldwin
  2023-03-27 20:57       ` Simon Marchi
@ 2023-03-27 21:00       ` John Baldwin
  1 sibling, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-03-27 21:00 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 3/27/23 1:24 PM, John Baldwin wrote:
> On 3/20/23 11:35 AM, Simon Marchi wrote:
>>> @@ -1110,21 +1143,18 @@ fbsd_next_vfork_done (void)
>>>    void
>>>    fbsd_nat_target::resume (ptid_t ptid, int step, enum gdb_signal signo)
>>
>> It would be a good time to rename the first parameter from ptid to
>> scope_ptid, to match what was done here:
>>
>> https://gitlab.com/gnutools/binutils-gdb/-/commit/d51926f06a7f4854bebdd71dcb0a78dbaa2f4168
>>
>> I think it helps understand the meaning of the parameter.
> 
> Hmm, I might do the rename as a separate commit first to avoid adding noise
> in this patch.

Actually, I think I will do this in the patch that reworks resume() to
properly handle multiple inferiors as that is when it starts to really
be treated as scope_ptid.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait.
  2023-03-27 20:49     ` John Baldwin
@ 2023-03-27 21:04       ` Simon Marchi
  0 siblings, 0 replies; 24+ messages in thread
From: Simon Marchi @ 2023-03-27 21:04 UTC (permalink / raw)
  To: John Baldwin, gdb-patches

On 3/27/23 16:49, John Baldwin wrote:
> On 3/20/23 12:09 PM, Simon Marchi wrote:
>> On 2/28/23 13:18, John Baldwin wrote:
>>> If wait_1 finds an event for a thread or process that does not match
>>> the set of threads and processes previously resumed, defer the event.
>>> If the event is for a specific thread, suspend the thread and continue
>>> the associated process before waiting for another event.
>>
>> I do not understand this, because it doesn't match my mental model of
>> how things should work (which is derived from working with linux-nat).
>> In that model, a thread with a pending event should never be
>> ptrace-resumed.  So the last sentence doesn't make sense, it implies
>> that a thread has a pending and is ptrace-resumed (since we suspend it).
> 
> The difference here might be that on FreeBSD ptrace can only PT_CONTINUE
> and wait() for entire processes.  When a thread contains multiple threads,
> PT_CONTINUE resumes all threads belonging to that process.  Individual
> threads can be controlled by using PT_SUSPEND on a specific LWP.  Those
> LWPs will stay asleep after PT_CONTINUE.  If any thread encounters a
> stop event (breakpoint, fork event, etc.) the entire process stops (the
> kernel signals threads in userland if needed to get them back into the
> kernel to stop).  Once it is stopped it reports an event via wait().
> 
> Currently on FreeBSD wait() only reports a single event at once.  So
> if multiple threads hit a breakpoint, wait() returns for the first one,
> PT_LWPINFO is used on the pid to find out what specific event happened
> (and which LWP it happened for).  The other thread will keep its event
> pending in the kernel, and after PT_CONTINUE it will immediately trigger
> another stop and an event reported from wait().

Thanks for the explanation.  I'll keep that in mind when re-reading
(perhaps a v2, if you want to rebase and submit a fresh version).

>>> One specific example of such an event is if a thread is created while
>>> another thread in the same process hits a breakpoint.  If the second
>>> thread's event is reported first, the target resume method does not
>>> yet "know" about the new thread and will not suspend it via
>>> PT_SUSPEND.
>>
>> I was surprised to read that the resume method suspends some threads.  I
>> don't think it should, but I do see the current code.  The current code
>> appears to interpret the resume request as: "ensure the state of thread
>> resumption looks exactly like this", so it suspends threads as needed,
>> to match the requested resumption state.  I don't think it's supposed to
>> work like that.  The resume method should only resume some threads, and
>> if the core of GDB wants some threads stopped, it will call the stop
>> method.
> 
> See above.  I can't individually suspend/resume threads.  I can only
> resume processes, but decide which set of threads within a process
> to keep resumed when resuming the entire process.

Ack.

> 
>> Anyhow, just to be sure I understand the problem you describe: when
>> fbsd-nat reports the breakpoint hit from "another thread", is the
>> "thread created" event still in the kernel, not consumed by GDB?
> 
> Yes.  What happens is that GDB tries to single-step a thread over a
> breakpoint, so it does a resume() of a single ptid_t.  To implement
> this, fbsd-nat's resume has to explicitly PT_SUSPEND all the other
> LWPs in the process before doing a PT_CONTINUE so that only the
> stepped thread will execute.  However, this logic doesn't yet know
> about the new thread (and it hasn't started executing in userland
> yet).  However, once the new thread starts executing, it immediately
> reports a ptrace event.  This confuses GDB currently becuase that
> ptrace event can be reported right after the PT_CONTINUE.  I think
> in part this is because unlike on Linux, when thread A creates thread
> B, thread A doesn't report anything.  Instead, only thread B reports
> an event when it starts.  On Linux thread creation is more like fork
> where thread A reports an event saying it created B.

Ack.

>  
>>> When wait is called, it will probably return the event
>>> from the first thread before the result of the step from second
>>> thread.  This is the case reported in PR 21497.
>>
>>> ---
>>>   gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++-
>>>   1 file changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>>> index ca278b871ef..3f7278c6ea0 100644
>>> --- a/gdb/fbsd-nat.c
>>> +++ b/gdb/fbsd-nat.c
>>> @@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>>>     if (is_async_p ())
>>>       async_file_flush ();
>>>   -  wptid = wait_1 (ptid, ourstatus, target_options);
>>> +  while (1)
>>> +    {
>>> +      wptid = wait_1 (ptid, ourstatus, target_options);
>>> +
>>> +      /* If no event was found, just return.  */
>>> +      if (ourstatus->kind () == TARGET_WAITKIND_IGNORE
>>> +      || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
>>> +    break;
>>> +
>>> +      /* If an event is reported for a thread or process while
>>> +     stepping some other thread, suspend the thread reporting the
>>> +     event and defer the event until it can be reported to the
>>> +     core.  */
>>> +      if (!wptid.matches (resume_ptid))
>>> +    {
>>> +      add_pending_event (wptid, *ourstatus);
>>> +      fbsd_nat_debug_printf ("deferring event [%s], [%s]",
>>> +                 target_pid_to_str (wptid).c_str (),
>>> +                 ourstatus->to_string ().c_str ());
>>> +      if (wptid.pid () == resume_ptid.pid ())
>>> +        {
>>> +          fbsd_nat_debug_printf ("suspending thread [%s]",
>>> +                     target_pid_to_str (wptid).c_str ());
>>> +          if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
>>> +        perror_with_name (("ptrace (PT_SUSPEND)"));
>>
>>
>> This is the part I don't understand.  After wait_1 returned an event for
>> a specific thread, I would expect this thread to be ptrace-stopped.  So,
>> I don't see the need to suspend it.  You'd just leave it in the "resumed
>> from the
>> until.
>>
>>> +          if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
>>> +        perror_with_name (("ptrace (PT_CONTINUE)"));
>>
>> I don't get why you continue `wptid.pid ()`.  What assures you that this
>> particular thread was stopped previously?  Perhaps I don't understand
>> because I assume somethings about pids/lwps and ptrace that are only
>> true on Linux.
> 
> In this specific case, here is the sequence of events:
> 
> - process P1 contains two threads, T1, and T2
> - T1 creates a new thread T3, but T3 isn't yet scheduled to execute
> - T2 hits a breakpoint so the process stops
> - wait() returns P1
> - PT_LWPINFO reports the breakpoint event for T2
> - GDB undoes the breakpoint and tries to single-step T2 over the
>   breakpoint
> - GDB calls resume with step=1 and scope_ptid=ptid(P1, T2)
> - fbsd_nat::resume iterates over the threads it knows about, but
>   it only knows about T1 and T2
>   - PT_SUSPEND T1
>   - PT_RESUME T2
> - the P1 process is now resumed via PT_CONTINUE
> - T3 starts executing and reports its "born" event
> - wait() returns P1
> - PT_LWPINFO reports the thread birth event for T3
> 
> Previously at this point fbsd_nat::wait() returned an event for T3
> which triggers the assertion failure in the PR.  With this patch
> what happens now is:
> 
> - fbsd_nat::wait() realizes T3 isn't "resumed" from the core's
>   perspective so explicitly suspends it via PT_SUSPEND
> - the P1 process is continued via PT_CONTINUE
> - T2 completes its step and reports the event
> - wait() returns P1
> - PT_LWPINFO reports the step-complete event for P2
> - the event for T2 is returned from fbsd_nat::wait() to the core

Ok, thanks, that makes sense.

Simon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes.
  2023-03-20 19:55   ` Simon Marchi
@ 2023-03-27 21:13     ` John Baldwin
  0 siblings, 0 replies; 24+ messages in thread
From: John Baldwin @ 2023-03-27 21:13 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 3/20/23 12:55 PM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> I did not fully understand the requirements of multiple process
>> support when I enabled it previously and several parts were broken.
>> In particular, the resume method was only resuming a single process,
>> and wait was not stopping other processes when reporting an event.
>>
>> To support multiple running inferiors, add a new per-inferior
>> structure which trackes the number of existing and running LWPs for
>> each process.  The structure also stores a ptid_t describing the
>> set of LWPs currently resumed for each process.
> 
> Ah, that sounds good, related to my comments on previous patches about
> tracking the resume state for each LWP.  I don't know if it needs to be
> per-inferior though, versus a global table like Linux does (but perhaps
> it helps, I'll see when reading the code).
> 
>>
>> For the resume method, iterate over all non-exited inferiors resuming
>> each process matching the passed in ptid rather than only resuming the
>> current inferior's process for a wildcard ptid.  If a resumed process
>> has a pending event, don't actually resume the process, but other
>> matching processes without a pending event are still resumed in case
>> the later call to the wait method requests an event from one of the
>> processes without a pending event.
>>
>> For the wait method, stop other running processes before returning an
>> event to the core.  When stopping a process, first check to see if an
>> event is already pending.  If it is, queue the event to be reported
>> later.  If not, send a SIGSTOP to the process and wait for it to stop.
>> If the event reported by the wait is not for the SIGSTOP, queue the
>> event and remember to ignore a future SIGSTOP event for the process.
>>
>> Note that, unlike the Linux native target, entire processes are
>> stopped rather than individual LWPs.  In FreeBSD one can only wait on
>> processes (via pid), not for an event from a specific thread.
>>
>> Other changes in this commit handle bookkeeping for the per-inferior
>> data such as purging the data in the mourn_inferior method and
>> migrating the data to the new inferior in the follow_exec method.  The
>> per-inferior data is created in the attach, create_inferior, and
>> follow_fork methods.
>> ---
>>   gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------
>>   gdb/fbsd-nat.h |   8 +
>>   2 files changed, 317 insertions(+), 94 deletions(-)
>>
>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>> index 3f7278c6ea0..14b31ddd86e 100644
>> --- a/gdb/fbsd-nat.c
>> +++ b/gdb/fbsd-nat.c
>> @@ -54,11 +54,26 @@
>>   #define	PT_SETREGSET	43	/* Set a target register set */
>>   #endif
>>   
>> -/* Filter for ptid's allowed to report events from wait.  Normally set
>> -   in resume, but also reset to minus_one_ptid in create_inferior and
>> -   attach.  */
>> +/* Information stored about each inferior.  */
>>   
>> -static ptid_t resume_ptid;
>> +struct fbsd_inferior_info
>> +{
>> +  /* Filter for resumed LWPs which can report events from wait.  */
>> +  ptid_t resumed_lwps = null_ptid;
>> +
>> +  /* Number of LWPs this process contains.  */
>> +  unsigned int num_lwps = 0;
>> +
>> +  /* Number of LWPs currently running.  */
>> +  unsigned int running_lwps = 0;
>> +
>> +  /* Have a pending SIGSTOP event that needs to be discarded.  */
>> +  bool pending_sigstop = false;
>> +};
> 
> Ok, it's not exactly what I expected, but I will keep on reading.
> 
> Long term, I don't think the resumed_lwps field will be enough to
> describe the resume state of individual threads.  Actually, to
> complement the example I gave on an earlier patch, I guess you could do
> that today?
> 
> (gdb) set scheduler-locking on
> (gdb) thread 1
> (gdb) continue & # only supposed to resume thread 1
> (gdb) thread 2
> (gdb) continue & # only supposed to resume thread 2
> 
> Here, resume_lwps would end up as (pid, thread_2_lwp, 0), right?

Ok, that answers a question I had then.  I do have a follow-up to this
I haven't posted (I mentioned it in the cover letter) where I replace
the single ptid with an unordered_set<> of LWPs belonging to the
process that should be resumed.  However, in order to make this work
I had to make all "real" resume calls deferred using commit_resumed
(but that only kind of works, I have to explicitly do commit_resumed
at the start of wait() because the core doesn't do it for me).  Basically,
in my followup the fbsd_nat::resume method just modifies state in the
per-inferior struct to keep track of at most one pending signal/step
and a set of LWPs to resume.  Then when either commit_resumed or wait
is called I walk all inferiors for the native target doing the actual
resume (PT_CONTINUE) after using PT_SUSPEND/PT_RESUME on the set of
LWPs that need to actually run for the given process.

Currently though I get new regressions with that approach compared to
this series. :(

FWIW, with this series what would happen for your example above is
that an assert() trips in fbsd_nat::resume when it tries to resume
the second thread when the process is already resumed.

>> +
>> +/* Per-inferior data key.  */
>> +
>> +static const registry<inferior>::key<fbsd_inferior_info> fbsd_inferior_data;
>>   
>>   /* If an event is triggered asynchronously (fake vfork_done events) or
>>      occurs when the core is not expecting it, a pending event is
>> @@ -95,21 +110,27 @@ have_pending_event (ptid_t filter)
>>     return false;
>>   }
>>   
>> -/* Helper method called by the target wait method.  Returns true if
>> -   there is a pending event matching resume_ptid.  If there is a
>> -   matching event, PTID and *STATUS contain the event details, and the
>> -   event is removed from the pending list.  */
>> +/* Returns true if there is a pending event for a resumed process
>> +   matching FILTER.  If there is a matching event, PTID and *STATUS
>> +   contain the event details, and the event is removed from the
>> +   pending list.  */
>>   
>>   static bool
>> -take_pending_event (ptid_t &ptid, target_waitstatus *status)
>> +take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid,
>> +		    target_waitstatus *status)
>>   {
>>     for (auto it = pending_events.begin (); it != pending_events.end (); it++)
>> -    if (it->ptid.matches (resume_ptid))
>> +    if (it->ptid.matches (filter))
>>         {
>> -	ptid = it->ptid;
>> -	*status = it->status;
>> -	pending_events.erase (it);
>> -	return true;
>> +	inferior *inf = find_inferior_ptid (target, it->ptid);
>> +	fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
>> +	if (it->ptid.matches (info->resumed_lwps))
>> +	  {
>> +	    ptid = it->ptid;
>> +	    *status = it->status;
>> +	    pending_events.erase (it);
>> +	    return true;
>> +	  }
> 
> If that code was kept as-is, I think take_pending_event should be a
> method of fbsd_nat_target, rather than passing it manually.

Ok.

>> +
>> +  if (ptid.pid () != inferior_ptid.pid ())
>> +    {
>> +      step = 0;
>> +      signo = GDB_SIGNAL_0;
>> +      gdb_assert (!ptid.lwp_p ());
> 
> I don't get why you ignore the step request here.  Perhaps it is due to
> a misundertanding of the resume interface (which was really confusing
> before Pedro's clarification)?
> 
> The comment on target_resume and the commit message on Pedro's change
> explain it well, but essentially:
> 
>   - The ptid passed as a parameter (SCOPE_PTID) is the set of threads to
>     resume.  If you keep a list of LWPs, then you can just go over that
>     list and resume everything that ptid_t::matches SCOPE_PTID, and that
>     doesn't have a pending event.
>   - STEP indicates whether to resume INFERIOR_PTID in resume mode.  If
>     STEP is 0, it means that no thread is resumed in step mode they are
>     all resumed normally.
>   - SIGNAL indicates what signal to resume INFERIOR_PTID with.  If
>     it's GDB_SIGNAL_0, it means resume without a signal.
> 
> So, the last two bullets only modify how the thread identified by
> INFERIOR_PTID is resumed.  I think that in practice, it's guaranteed in
> practice today that INFERIOR_PTID is "within" SCOPE_PTID.  But you can
> also write the code without that assumption, it should be much harder.
> 
> For threads that are not INFERIOR_PTID, I think the target should
> resume them with the signal in thread_info::m_suspend::stop_signal.  But
> that can be a problem for another day.

Mmmm, that would be good to know, that detail is not obvious.

> I'll wait for clarifications from you before continuing to read, because
> I am a bit lost with the approach taken here.

To clarify the code above, it might be helpful to note that there is now
a "resume_one_process" function that you are replying to above that is
called in a loop by the actual resume method:

void
fbsd_nat_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
{
   fbsd_nat_debug_start_end ("[%s], step %d, signo %d (%s)",
			    target_pid_to_str (scope_ptid).c_str (), step, signo,
			    gdb_signal_to_name (signo));

   gdb_assert (inferior_ptid.matches (scope_ptid));
   gdb_assert (!scope_ptid.tid_p ());

   if (scope_ptid == minus_one_ptid)
     {
       for (inferior *inf : all_non_exited_inferiors (this))
	resume_one_process (ptid_t (inf->pid), step, signo);
     }
   else
     {
       resume_one_process (scope_ptid, step, signo);
     }
}

The reason for the quoted code that clears the step/signal field in
resume_one_process is that I wasn't sure if I could get a "global" resume
(scope_ptid == minus_one_ptid) where step and/or signo was also set.  In
that case I wanted to be sure that I only applied to requested step/signo
to inferior_ptid and not the other processes.  That is, it's trying to
handle a case of:

- inferior_ptid == ptid(P1, T1)
- resume(minus_one_ptid, step=1)

In that case it does the loop over all inferiors, but I only want
to pass step=1 down to inf_ptrace::resume() for the inferior (process)
whose pid is P1.

If you are telling me I can write an assertion in ::resume() that is
more like:

if (step || signo != GDB_SIGNAL_0)
    gdb_assert(scope_ptid != minus_one_ptid);

Then that would mean I could avoid clearing them in resume_one_process.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-03-27 21:13 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
2023-03-20 17:43   ` Simon Marchi
2023-03-20 18:08     ` John Baldwin
2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
2023-03-20 17:45   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
2023-03-20 17:47   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
2023-03-20 18:35   ` Simon Marchi
2023-03-27 20:24     ` John Baldwin
2023-03-27 20:57       ` Simon Marchi
2023-03-27 21:00       ` John Baldwin
2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
2023-03-20 19:09   ` Simon Marchi
2023-03-27 20:49     ` John Baldwin
2023-03-27 21:04       ` Simon Marchi
2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
2023-03-20 19:55   ` Simon Marchi
2023-03-27 21:13     ` John Baldwin
2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).