public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs
@ 2022-01-17 16:27 Simon Marchi
  2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches

Someone reported to me a while ago that stepping in a multi-threaded
program where some threads sometimes call vfork would make GDB behave
erratically.  Some "step"s or "next"s would land in the wrong place, or
just behave as "continue".  After a bit of investigation, I realized
that GDB didn't handle vforks by multi-threaded program correctly.  See
patch "fix handling of vfork by multi-threaded program" for an in-depth
explanation.

While working on this, I tried to fix all combinations of
target-non-stop, non-stop, schedule-multiple, detach-on-fork and
follow-fork-mode.  However, I was not seeing the end of it and it was
making me a bit crazy.  So I chose to only focus on the "default" case
of detach-on-fork=on and follow-fork-mode=parent for now.  It will
probably make it easier for the reviewers too.

Simon Marchi (8):
  gdb/infrun: add reason parameter to stop_all_threads
  gdb/linux-nat: remove check based on current_inferior in
    linux_handle_extended_wait
  gdb: replace inferior::waiting_for_vfork_done with
    inferior::thread_waiting_for_vfork_done
  gdb/infrun: add inferior parameters to stop_all_threads and
    restart_threads
  gdb/infrun: add logging statement to do_target_resume
  gdb: fix handling of vfork by multi-threaded program
    (follow-fork-mode=parent, detach-on-fork=on)
  gdbserver: report correct status in thread stop race condition
  gdb: resume ongoing step after handling fork or vfork

 gdb/infcmd.c                                  |   2 +-
 gdb/inferior.h                                |   8 +-
 gdb/infrun.c                                  | 210 ++++++++++++++++--
 gdb/infrun.h                                  |  14 +-
 gdb/linux-nat.c                               |  17 +-
 gdb/remote.c                                  |   2 +-
 .../gdb.threads/next-fork-other-thread.c      |  86 +++++++
 .../gdb.threads/next-fork-other-thread.exp    | 116 ++++++++++
 .../gdb.threads/vfork-multi-inferior-sleep.c  |  25 +++
 .../gdb.threads/vfork-multi-inferior.c        |  55 +++++
 .../gdb.threads/vfork-multi-inferior.exp      | 115 ++++++++++
 .../gdb.threads/vfork-multi-thread.c          |  74 ++++++
 .../gdb.threads/vfork-multi-thread.exp        |  96 ++++++++
 gdbserver/linux-low.cc                        |  60 +++--
 14 files changed, 801 insertions(+), 79 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/next-fork-other-thread.c
 create mode 100644 gdb/testsuite/gdb.threads/next-fork-other-thread.exp
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior-sleep.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior.exp
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.exp


base-commit: 1adce770ea443ec73c8af29618c504495893d0b8
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 15:05   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait Simon Marchi
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

From: Simon Marchi <simon.marchi@efficios.com>

Add a "reason" parameter, only used to show in debug messages what is
the reason for stopping all threads.  This helped me understand the
debug logs while adding some new uses of stop_all_threads, so I am
proposing to merge it.

Change-Id: I66c8c335ebf41836a7bc3d5fe1db92c195f65e55
---
 gdb/infcmd.c |  2 +-
 gdb/infrun.c | 10 +++++-----
 gdb/infrun.h | 11 +++++++----
 gdb/remote.c |  2 +-
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 9f4ed8bff138..16b1ca65d812 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -2473,7 +2473,7 @@ attach_post_wait (int from_tty, enum attach_post_wait_mode mode)
 	{
 	  struct thread_info *lowest = inferior_thread ();
 
-	  stop_all_threads ();
+	  stop_all_threads ("attaching");
 
 	  /* It's not defined which thread will report the attach
 	     stop.  For consistency, always select the thread with
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 2335fc74dc02..46e14d720cce 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2391,7 +2391,7 @@ resume_1 (enum gdb_signal sig)
 	  /* Fallback to stepping over the breakpoint in-line.  */
 
 	  if (target_is_non_stop_p ())
-	    stop_all_threads ();
+	    stop_all_threads ("displaced stepping falling back on inline stepping");
 
 	  set_step_over_info (regcache->aspace (),
 			      regcache_read_pc (regcache), 0, tp->global_num);
@@ -4915,7 +4915,7 @@ handle_one (const wait_one_event &event)
 /* See infrun.h.  */
 
 void
-stop_all_threads (void)
+stop_all_threads (const char *reason)
 {
   /* We may need multiple passes to discover all threads.  */
   int pass;
@@ -4923,7 +4923,7 @@ stop_all_threads (void)
 
   gdb_assert (exists_non_stop_target ());
 
-  infrun_debug_printf ("starting");
+  INFRUN_SCOPED_DEBUG_START_END ("reason=%s", reason);
 
   scoped_restore_current_thread restore_thread;
 
@@ -7963,7 +7963,7 @@ stop_waiting (struct execution_control_state *ecs)
   /* If all-stop, but there exists a non-stop target, stop all
      threads now that we're presenting the stop to the user.  */
   if (!non_stop && exists_non_stop_target ())
-    stop_all_threads ();
+    stop_all_threads ("presenting stop to user in all-stop");
 }
 
 /* Like keep_going, but passes the signal to the inferior, even if the
@@ -8063,7 +8063,7 @@ keep_going_pass_signal (struct execution_control_state *ecs)
 	 we're about to step over, otherwise other threads could miss
 	 it.  */
       if (step_over_info_valid_p () && target_is_non_stop_p ())
-	stop_all_threads ();
+	stop_all_threads ("starting in-line step-over");
 
       /* Stop stepping if inserting breakpoints fails.  */
       try
diff --git a/gdb/infrun.h b/gdb/infrun.h
index 3e84805accb5..be600fd8f7da 100644
--- a/gdb/infrun.h
+++ b/gdb/infrun.h
@@ -40,8 +40,8 @@ extern bool debug_infrun;
 
 /* Print "infrun" start/end debug statements.  */
 
-#define INFRUN_SCOPED_DEBUG_START_END(msg) \
-  scoped_debug_start_end (debug_infrun, "infrun", msg)
+#define INFRUN_SCOPED_DEBUG_START_END(fmt, ...) \
+  scoped_debug_start_end (debug_infrun, "infrun", fmt, ##__VA_ARGS__)
 
 /* Print "infrun" enter/exit debug statements.  */
 
@@ -139,8 +139,11 @@ extern void set_last_target_status (process_stratum_target *target, ptid_t ptid,
    target_wait().  */
 extern void nullify_last_target_wait_ptid ();
 
-/* Stop all threads.  Only returns after everything is halted.  */
-extern void stop_all_threads (void);
+/* Stop all threads.  Only returns after everything is halted.
+
+   REASON is a string indicating the reason why we stop all threads, used in
+   debug messages.  */
+extern void stop_all_threads (const char *reason);
 
 extern void prepare_for_detach (void);
 
diff --git a/gdb/remote.c b/gdb/remote.c
index b126532af45d..8a5cc1108007 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -4641,7 +4641,7 @@ remote_target::process_initial_stop_replies (int from_tty)
 	gdb_assert (!this->is_async_p ());
 	SCOPE_EXIT { target_async (0); };
 	target_async (1);
-	stop_all_threads ();
+	stop_all_threads ("remote connect in all-stop");
       }
 
       /* If all threads of an inferior were already stopped, we
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
  2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 16:12   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done Simon Marchi
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches

The check removed by this patch, using current_inferior, looks wrong.
When debugging multiple inferiors with the Linux native target and
linux_handle_extended_wait is called, there's no guarantee about which
is the current inferior.  The vfork-done event we receive could be for
any inferior.  If the vfork-done event is for a non-current inferior, we
end up wrongfully ignoring it.  As a result, the core never processes a
TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed
is never cleared, and breakpoints are never reinserted.  However,
because the Linux native target decided to ignore the event, it resumed
the thread - while breakpoints out.  And that's bad.

The proposed fix is to remove this check.  Always report vfork-done
events and let infrun's logic decide if it should be ignored.  We don't
save much cycles by filtering the event here.

Add a test that sets replicates the situation described above.  See
comments in the test for more details.

Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
---
 gdb/linux-nat.c                               |  17 +--
 .../gdb.threads/vfork-multi-inferior-sleep.c  |  25 ++++
 .../gdb.threads/vfork-multi-inferior.c        |  55 +++++++++
 .../gdb.threads/vfork-multi-inferior.exp      | 115 ++++++++++++++++++
 4 files changed, 199 insertions(+), 13 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior-sleep.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-inferior.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index b926fb5eba9e..7de4da03a34f 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2096,20 +2096,11 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
 
   if (event == PTRACE_EVENT_VFORK_DONE)
     {
-      if (current_inferior ()->waiting_for_vfork_done)
-	{
-	  linux_nat_debug_printf
-	    ("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld: stopping",
-	     lp->ptid.lwp ());
-
-	  ourstatus->set_vfork_done ();
-	  return 0;
-	}
-
       linux_nat_debug_printf
-	("Got PTRACE_EVENT_VFORK_DONE from LWP %ld: ignoring", lp->ptid.lwp ());
-
-      return 1;
+	("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld",
+	 lp->ptid.lwp ());
+	ourstatus->set_vfork_done ();
+	return 0;
     }
 
   internal_error (__FILE__, __LINE__,
diff --git a/gdb/testsuite/gdb.threads/vfork-multi-inferior-sleep.c b/gdb/testsuite/gdb.threads/vfork-multi-inferior-sleep.c
new file mode 100644
index 000000000000..a26ae07d53b7
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/vfork-multi-inferior-sleep.c
@@ -0,0 +1,25 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+
+int
+main (void)
+{
+  sleep (30);
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/vfork-multi-inferior.c b/gdb/testsuite/gdb.threads/vfork-multi-inferior.c
new file mode 100644
index 000000000000..1b7b00d3c755
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/vfork-multi-inferior.c
@@ -0,0 +1,55 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+#include <sys/wait.h>
+#include <assert.h>
+
+static void
+should_break_here (void)
+{
+}
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < NR_LOOPS; i++)
+    {
+      int pid = vfork ();
+
+      if (pid != 0)
+	{
+	  /* Parent */
+	  int stat;
+	  int ret = waitpid (pid, &stat, 0);
+	  assert (ret == pid);
+	  assert (WIFEXITED (stat));
+	  assert (WEXITSTATUS (stat) == 12);
+
+	  should_break_here ();
+	}
+      else
+	{
+	  /* Child */
+	  _exit(12);
+	}
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/vfork-multi-inferior.exp b/gdb/testsuite/gdb.threads/vfork-multi-inferior.exp
new file mode 100644
index 000000000000..22a1476eaacc
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/vfork-multi-inferior.exp
@@ -0,0 +1,115 @@
+# Copyright 2020-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test handling a vfork while another inferior is running.  The bug that
+# prompted writing this test case was in the Linux native target.  The target
+# assumed that the vfork-done event it received was for the current inferior
+# (an invalid assumption, the current inferior is the one randomly selected by
+# do_target_wait (at the time of writing).  This caused the target to drop the
+# vfork-done event, because it was seen as unneeded and to restart the thread
+# as if nothing happened.  This however resulted in the thread running with
+# breakpoints not inserted.
+#
+# To catch the bug, this test verifies that we can hit a breakpoint after a
+# vfork call, while a second inferior runs in the background.
+
+if [use_gdb_stub] {
+    unsupported "test uses multiple inferiors"
+    return
+}
+
+standard_testfile .c -sleep.c
+
+set srcfile_sleep $srcfile2
+set binfile_sleep ${binfile}-sleep
+
+# The reproducibility of the bug depends on which inferior randomly selects in
+# do_target_wait when consuming the vfork-done event.  Since GDB doesn't call
+# srand(), we are likely to always see the same sequence of inferior selected by
+# do_target_wait, which can hide the bug if you are not "lucky".  To work
+# around that, call vfork and hit the breakpoint in a loop, it makes it
+# somewhat likely that the wrong inferior will be selected eventually.
+set nr_loops 20
+
+# Compile the main program that calls vfork and hits a breakpoint.
+set opts [list debug additional_flags=-DNR_LOOPS=$nr_loops]
+if  { [gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable \
+	$opts] != "" } {
+    untested "failed to compile"
+    return -1
+}
+
+# Compile the secondary program, which just sleeps.
+if  { [gdb_compile "${srcdir}/${subdir}/${srcfile_sleep}" "${binfile_sleep}" executable \
+	{debug}] != "" } {
+    untested "failed to compile"
+    return -1
+}
+
+# We exercise two methods of getting a second inferior to execute while the
+# first one vforks.  METHOD can be:
+#
+#   - non-stop: start GDB with non-stop on and run the second inferior in
+#               background.
+#   - schedule-multiple: set "schedule-multiple on", this will make "continue"
+#                        resume both inferiors.
+proc do_test {method} {
+    save_vars { ::GDBFLAGS } {
+	if { $method == "non-stop" } {
+	    append ::GDBFLAGS " -ex \"set non-stop on\""
+	}
+	clean_restart
+    }
+
+    # Start the second inferior in background.
+    gdb_test "add-inferior" "Added inferior 2.*"
+    gdb_test "inferior 2" "Switching to inferior 2 .*"
+    gdb_file_cmd ${::binfile_sleep}
+    if { $method == "non-stop" } {
+	gdb_test "run &" "Starting program: .*" "run inferior 2"
+    } else {
+	gdb_test "start" "Temporary breakpoint $::decimal, main .*" \
+		"start inferior 2"
+    }
+
+    # Start the first inferior.
+    gdb_test "inferior 1" "Switching to inferior 1 .*"
+    gdb_file_cmd ${::binfile}
+    gdb_test "break should_break_here" "Breakpoint $::decimal at .*"
+    gdb_test "start" "Thread 1.1 .* hit Temporary breakpoint.*" \
+	"start inferior 1"
+
+    # Only enable schedule-multiple this late, because of:
+    # https://sourceware.org/bugzilla/show_bug.cgi?id=28777
+    if { $method == "schedule-multiple" } {
+	gdb_test_no_output "set schedule-multiple on"
+    }
+
+
+    # Continue over vfork and until the breakpoint.  The number of loops here
+    # matches the number of loops in the program.  So if a breakpoint is missed
+    # at some point, a "continue" will wrongfully continue until the end of the
+    # program, which will fail the test.
+    for {set i 0} {$i < $::nr_loops} {incr i} {
+	with_test_prefix "i=$i" {
+	    gdb_test "continue" \
+		"Thread 1.1 .* hit Breakpoint $::decimal, should_break_here.*"
+	}
+    }
+}
+
+foreach_with_prefix method {schedule-multiple non-stop} {
+    do_test $method
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
  2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
  2022-01-17 16:27 ` [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:17   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads Simon Marchi
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches

The inferior::waiting_for_vfork_done flag indicates that some thread in
that inferior is waiting for a vfork-done event.  Subsequent patches
will need to know which thread is waiting for that event.

I think there is a latent buglet in that waiting_for_vfork_done is
currently not reset on inferior exec or exit.  I could imagine that if a
thread in the parent process calls exec or exit while another thread of
the parent process is waiting for its vfork child to exec or exit, we
could end up with inferior::waiting_for_vfork_done without a thread
actually waiting for a vfork-done event anymore.  And since that flag is
checked in resume_1, things could misbehave there.

Since the new field points to a thread_info object, and those are
destroyed on exec or exit, it could be worse now since we could try to
access freed memory, if thread_waiting_for_vfork_done were to point to a
stale thread_info.  To avoid this, clear the field in
infrun_inferior_exit and infrun_inferior_execd.

Change-Id: I31b847278613a49ba03fc4915f74d9ceb228fdce
---
 gdb/inferior.h |  8 ++++----
 gdb/infrun.c   | 14 +++++++++-----
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/gdb/inferior.h b/gdb/inferior.h
index ec0fb6e8b16c..7b82703f1470 100644
--- a/gdb/inferior.h
+++ b/gdb/inferior.h
@@ -531,10 +531,10 @@ class inferior : public refcounted_object,
      exits or execs.  */
   bool pending_detach = false;
 
-  /* True if this inferior is a vfork parent waiting for a vfork child
-     not under our control to be done with the shared memory region,
-     either by exiting or execing.  */
-  bool waiting_for_vfork_done = false;
+  /* If non-nullptr, points to a thread that called vfork and is now waiting
+     for a vfork child not under our control to be done with the shared memory
+     region, either by exiting or execing.  */
+  thread_info *thread_waiting_for_vfork_done = nullptr;
 
   /* True if we're in the process of detaching from this inferior.  */
   bool detaching = false;
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 46e14d720cce..40b4bdd73130 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -507,7 +507,8 @@ holding the child stopped.  Try \"set detach-on-fork\" or \
 	     insert breakpoints, so that we can debug it.  A
 	     subsequent child exec or exit is enough to know when does
 	     the child stops using the parent's address space.  */
-	  parent_inf->waiting_for_vfork_done = detach_fork;
+	  parent_inf->thread_waiting_for_vfork_done
+	    = detach_fork ? inferior_thread () : nullptr;
 	  parent_inf->pspace->breakpoints_not_allowed = detach_fork;
 	}
     }
@@ -639,7 +640,7 @@ holding the child stopped.  Try \"set detach-on-fork\" or \
 	  child_inf->pending_detach = 0;
 	  parent_inf->vfork_child = child_inf;
 	  parent_inf->pending_detach = detach_fork;
-	  parent_inf->waiting_for_vfork_done = 0;
+	  parent_inf->thread_waiting_for_vfork_done = nullptr;
 	}
       else if (detach_fork)
 	{
@@ -1484,6 +1485,7 @@ static void
 infrun_inferior_exit (struct inferior *inf)
 {
   inf->displaced_step_state.reset ();
+  inf->thread_waiting_for_vfork_done = nullptr;
 }
 
 static void
@@ -1502,6 +1504,8 @@ infrun_inferior_execd (inferior *inf)
      one in progress at the time of the exec, it must have been the exec'ing
      thread.  */
   clear_step_over_info ();
+
+  inf->thread_waiting_for_vfork_done = nullptr;
 }
 
 /* If ON, and the architecture supports it, GDB will use displaced
@@ -2254,7 +2258,7 @@ resume_1 (enum gdb_signal sig)
   /* Depends on stepped_breakpoint.  */
   step = currently_stepping (tp);
 
-  if (current_inferior ()->waiting_for_vfork_done)
+  if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
     {
       /* Don't try to single-step a vfork parent that is waiting for
 	 the child to get out of the shared memory region (by exec'ing
@@ -2374,7 +2378,7 @@ resume_1 (enum gdb_signal sig)
       && use_displaced_stepping (tp)
       && !step_over_info_valid_p ()
       && sig == GDB_SIGNAL_0
-      && !current_inferior ()->waiting_for_vfork_done)
+      && current_inferior ()->thread_waiting_for_vfork_done == nullptr)
     {
       displaced_step_prepare_status prepare_status
 	= displaced_step_prepare (tp);
@@ -5618,7 +5622,7 @@ handle_inferior_event (struct execution_control_state *ecs)
 
       context_switch (ecs);
 
-      current_inferior ()->waiting_for_vfork_done = 0;
+      current_inferior ()->thread_waiting_for_vfork_done = nullptr;
       current_inferior ()->pspace->breakpoints_not_allowed = 0;
 
       if (handle_stop_requested (ecs))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
                   ` (2 preceding siblings ...)
  2022-01-17 16:27 ` [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:17   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume Simon Marchi
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches

A following patch will want to stop all threads of a given inferior (as
opposed to all threads of all inferiors) while handling a vfork, and
restart them after.  To help with this, add inferior parameters to
stop_all_threads and restart_threads.  This is done as a separate patch
to make sure this doesn't cause regressions on its own, and to keep the
following patches more concise.

No visible changes are expected here, since all calls sites pass
nullptr, which should keep the existing behavior.

Change-Id: I4d9ba886ce842042075b4e346cfa64bbe2580dbf
---
 gdb/infrun.c | 39 +++++++++++++++++++++++++++++++--------
 gdb/infrun.h |  7 +++++--
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 40b4bdd73130..4d12c72f89fd 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3754,7 +3754,8 @@ struct wait_one_event
 };
 
 static bool handle_one (const wait_one_event &event);
-static void restart_threads (struct thread_info *event_thread);
+static void restart_threads (struct thread_info *event_thread,
+			     inferior *inf = nullptr);
 
 /* Prepare and stabilize the inferior for detaching it.  E.g.,
    detaching while a thread is displaced stepping is a recipe for
@@ -4919,7 +4920,7 @@ handle_one (const wait_one_event &event)
 /* See infrun.h.  */
 
 void
-stop_all_threads (const char *reason)
+stop_all_threads (const char *reason, inferior *inf)
 {
   /* We may need multiple passes to discover all threads.  */
   int pass;
@@ -4927,22 +4928,29 @@ stop_all_threads (const char *reason)
 
   gdb_assert (exists_non_stop_target ());
 
-  INFRUN_SCOPED_DEBUG_START_END ("reason=%s", reason);
+  INFRUN_SCOPED_DEBUG_START_END ("reason=%s, inf=%d", reason,
+				 inf != nullptr ? inf->num : -1);
 
   scoped_restore_current_thread restore_thread;
 
-  /* Enable thread events of all targets.  */
+  /* Enable thread events on relevant targets.  */
   for (auto *target : all_non_exited_process_targets ())
     {
+      if (inf != nullptr && inf->process_target () != target)
+	continue;
+
       switch_to_target_no_thread (target);
       target_thread_events (true);
     }
 
   SCOPE_EXIT
     {
-      /* Disable thread events of all targets.  */
+      /* Disable thread events on relevant targets.  */
       for (auto *target : all_non_exited_process_targets ())
 	{
+	  if (inf != nullptr && inf->process_target () != target)
+	    continue;
+
 	  switch_to_target_no_thread (target);
 	  target_thread_events (false);
 	}
@@ -4967,6 +4975,9 @@ stop_all_threads (const char *reason)
 
 	  for (auto *target : all_non_exited_process_targets ())
 	    {
+	      if (inf != nullptr && inf->process_target () != target)
+		continue;
+
 	      switch_to_target_no_thread (target);
 	      update_thread_list ();
 	    }
@@ -4975,6 +4986,9 @@ stop_all_threads (const char *reason)
 	     to tell the target to stop.  */
 	  for (thread_info *t : all_non_exited_threads ())
 	    {
+	      if (inf != nullptr && t->inf != inf)
+		continue;
+
 	      /* For a single-target setting with an all-stop target,
 		 we would not even arrive here.  For a multi-target
 		 setting, until GDB is able to handle a mixture of
@@ -5717,17 +5731,26 @@ handle_inferior_event (struct execution_control_state *ecs)
 }
 
 /* Restart threads back to what they were trying to do back when we
-   paused them for an in-line step-over.  The EVENT_THREAD thread is
-   ignored.  */
+   paused them (because of an in-line step-over or vfork, for example).
+   The EVENT_THREAD thread is ignored (not restarted).
+
+   If INF is non-nullptr, only resume threads from INF.  */
 
 static void
-restart_threads (struct thread_info *event_thread)
+restart_threads (struct thread_info *event_thread, inferior *inf)
 {
+  INFRUN_SCOPED_DEBUG_START_END ("event_thread=%s, inf=%d",
+				 event_thread->ptid.to_string ().c_str (),
+				 inf != nullptr ? inf->num : -1);
+
   /* In case the instruction just stepped spawned a new thread.  */
   update_thread_list ();
 
   for (thread_info *tp : all_non_exited_threads ())
     {
+      if (inf != nullptr && tp->inf != inf)
+	continue;
+
       if (tp->inf->detaching)
 	{
 	  infrun_debug_printf ("restart threads: [%s] inferior detaching",
diff --git a/gdb/infrun.h b/gdb/infrun.h
index be600fd8f7da..7e135ed420b7 100644
--- a/gdb/infrun.h
+++ b/gdb/infrun.h
@@ -142,8 +142,11 @@ extern void nullify_last_target_wait_ptid ();
 /* Stop all threads.  Only returns after everything is halted.
 
    REASON is a string indicating the reason why we stop all threads, used in
-   debug messages.  */
-extern void stop_all_threads (const char *reason);
+   debug messages.
+
+   If INF is non-nullptr, stop all threads of that inferior.  Otherwise, stop
+   all threads of all inferiors.  */
+extern void stop_all_threads (const char *reason, inferior *inf = nullptr);
 
 extern void prepare_for_detach (void);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
                   ` (3 preceding siblings ...)
  2022-01-17 16:27 ` [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:18   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Simon Marchi
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches

This helped me, it shows which ptid we actually call target_resume with.

Change-Id: I2dfd771e83df8c25f39371a13e3e91dc7882b73d
---
 gdb/infrun.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4d12c72f89fd..c976e3efbfb4 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2189,6 +2189,10 @@ do_target_resume (ptid_t resume_ptid, bool step, enum gdb_signal sig)
   else
     target_pass_signals (signal_pass);
 
+  infrun_debug_printf ("resume_ptid=%s, step=%d, sig=%s",
+		       resume_ptid.to_string ().c_str (),
+		       step, gdb_signal_to_symbol_string (sig));
+
   target_resume (resume_ptid, step, sig);
 
   if (target_can_async_p ())
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
                   ` (4 preceding siblings ...)
  2022-01-17 16:27 ` [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:21   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 7/8] gdbserver: report correct status in thread stop race condition Simon Marchi
  2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

From: Simon Marchi <simon.marchi@efficios.com>

There is a problem with how GDB handles a vfork happening in a
multi-threaded program.  This problem was reported to me by somebody not
using vfork directly, but using system(3) in a multi-threaded program,
which may be implemented using vfork.

This patch only deals about the follow-fork-mode=parent,
detach-on-fork=on case, because it would be too much to chew at once to
fix the bugs in the other cases as well (I tried).

The problem
-----------

When a program vforks, the parent thread is suspended by the kernel
until the child process exits or execs.  Specifically, in a
multi-threaded program, only the thread that called vfork is suspended,
other threads keep running freely. This is documented in the vfork(2)
man page ("Caveats" section).

Let's suppose GDB is handling a vfork and the user's desire is to detach
from the child. Before detaching the child, GDB must remove the software
breakpoints inserted in the shared parent/child address space, in case
there's a breakpoint in the path the child is going to take before
exec'ing or exit'ing (unlikely, but possible). Otherwise the child could
hit a breakpoint instruction while running outside the control of GDB,
which would make it crash.  GDB must also avoid re-inserting breakpoints
in the parent as long as it didn't receive the "vfork done" event (that
is, when the child has exited or execed): since the address space is
shared with the child, that would re-insert breakpoints in the child
process also. So what GDB does is:

  1. Receive "vfork" event for the parent
  2. Remove breakpoints from the (shared) address space and set
     program_space::breakpoints_not_allowed to avoid re-inserting them
  3. Detach from the child thread
  4. Resume the parent
  5. Wait for and receive "vfork done" event for the parent
  6. Clean program_space::breakpoints_not_allowed and re-insert
     breakpoints
  7. Resume the parent

Resuming the parent at step 4 is necessary in order for the kernel to
report the "vfork done" event.  The kernel won't report a ptrace event
for a thread that is ptrace-stopped.  But the theory behind this is that
between steps 4 and 5, the parent won't actually do any progress even
though it is ptrace-resumed, because the kernel keeps it suspended,
waiting for the child to exec or exit.  So it doesn't matter for that
thread if breakpoints are not inserted.

The problem is when the program is multi-threaded.  In step 4, GDB
resumes all threads of the parent. The thread that did the vfork stays
suspended by the kernel, so that's fine. But other threads are running
freely while breakpoints are removed, which is a problem because they
could miss a breakpoint that they should have hit.

The problem is present with all-stop and non-stop targets.  The only
difference is that with an all-stop targets, the other threads are
stopped by the target when it reports the vfork event and are resumed by
the target when GDB resumes the parent.  With a non-stop target, the
other threads are simply never stopped.

The fix
-------

There many combinations of settings to consider (all-stop/non-stop,
target-non-stop on/off, follow-fork-mode parent/child, detach-on-fork
on/off, schedule-multiple on/off), but for this patch I restrict the
scope to follow-fork-mode=parent, detach-on-fork=on.  That's the
"default" case, where we detach the child and keep debugging the
parent.  I tried to fix them all, but it's just too much to do at once.
The code paths and behaviors for when we don't detach the child are
completely different.

The guiding principle for this patch is that all threads of the vforking
inferior should be stopped as long as breakpoints are removed.  This is
similar to handling in-line step-overs, in a way.

For non-stop targets (the default on Linux native), this is what
happens:

 - In follow_fork, we call stop_all_threads to stop all threads of the
   inferior
 - In follow_fork_inferior, we record the vfork parent thread in
   inferior::thread_waiting_for_vfork_done
 - Back in handle_inferior_event, we call keep_going, which resumes only
   the event thread (this is already the case, with a non-stop target).
   This is the thread that will be waiting for vfork-done.
 - When we get the vfork-done event, we go in the (new) handle_vfork_done
   function to restart the previously stopped threads.

In the same scenario, but with an all-stop target:

 - In follow_fork, no need to stop all threads of the inferior, the
   target has stopped all threads of all its inferiors before returning
   the event.
 - In follow_fork_inferior, we record the vfork parent thread in
   inferior::thread_waiting_for_vfork_done.
 - Back in handle_inferior_event, we also call keep_going.  However, we
   only want to resume the event thread here, not all inferior threads.
   In internal_resume_ptid (called by resume_1), we therefore now check
   whether one of the inferiors we are about to resume has
   thread_waiting_for_vfork_done set.  If so, we only resume that
   thread.  When resuming multiple inferiors, one vforking and one not
   vforking, we could resume the vforking thread from the vforking
   inferior plus all threads from the vforking inferior.  However, the
   target_resume interface today does not allow this.
 - When we get the vfork-done event, the existing call to keep_going
   naturally resumes all threads.

Testing-wise, add a test that tries to make the main thread hit a
breakpoint while a secondary thread calls vfork.  Without the fix, the
main thread keeps going while breakpoints are removed, resulting in a
missed breakpoint and the program exiting.

Change-Id: I20eb78e17ca91f93c19c2b89a7e12c382ee814a1
---
 gdb/infrun.c                                  | 134 ++++++++++++++++--
 .../gdb.threads/vfork-multi-thread.c          |  74 ++++++++++
 .../gdb.threads/vfork-multi-thread.exp        |  96 +++++++++++++
 3 files changed, 296 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.c
 create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index c976e3efbfb4..75252be22038 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -95,6 +95,11 @@ static void resume (gdb_signal sig);
 
 static void wait_for_inferior (inferior *inf);
 
+static void restart_threads (struct thread_info *event_thread,
+			     inferior *inf = nullptr);
+
+static bool start_step_over (void);
+
 /* Asynchronous signal handler registered as event loop source for
    when we have pending events ready to be passed to the core.  */
 static struct async_event_handler *infrun_async_inferior_event_token;
@@ -431,6 +436,8 @@ holding the child stopped.  Try \"set detach-on-fork\" or \
   inferior *parent_inf = current_inferior ();
   inferior *child_inf = nullptr;
 
+  gdb_assert (parent_inf->thread_waiting_for_vfork_done == nullptr);
+
   if (!follow_child)
     {
       /* Detach new forked process?  */
@@ -640,7 +647,6 @@ holding the child stopped.  Try \"set detach-on-fork\" or \
 	  child_inf->pending_detach = 0;
 	  parent_inf->vfork_child = child_inf;
 	  parent_inf->pending_detach = detach_fork;
-	  parent_inf->thread_waiting_for_vfork_done = nullptr;
 	}
       else if (detach_fork)
 	{
@@ -773,6 +779,12 @@ follow_fork ()
 	parent = inferior_ptid;
 	child = tp->pending_follow.child_ptid ();
 
+	/* If handling a vfork, stop all the inferior's threads, they will be
+	   restarted when the vfork shared region is complete.  */
+	if (tp->pending_follow.kind () == TARGET_WAITKIND_VFORKED
+	    && target_is_non_stop_p ())
+	  stop_all_threads ("handling vfork", tp->inf);
+
 	process_stratum_target *parent_targ = tp->inf->process_target ();
 	/* Set up inferior(s) as specified by the caller, and tell the
 	   target to do whatever is necessary to follow either parent
@@ -1034,6 +1046,53 @@ handle_vfork_child_exec_or_exit (int exec)
     }
 }
 
+/* Handle TARGET_WAITKIND_VFORK_DONE.  */
+
+static void
+handle_vfork_done (thread_info *event_thread)
+{
+  /* We only care about this event if inferior::thread_waiting_for_vfork_done is
+     set, that is if we are waiting for a vfork child not under our control
+     (because we detached it) to exec or exit.
+
+     If an inferior has vforked and we are debugging the child, we don't use
+     the vfork-done event to get notified about the end of the shared address
+     space window).  We rely instead on the child's exec or exit event, and the
+     inferior::vfork_{parent,child} fields are used instead.  See
+     handle_vfork_child_exec_or_exit for that.  */
+  if (event_thread->inf->thread_waiting_for_vfork_done == nullptr)
+    {
+      infrun_debug_printf ("not waiting for a vfork-done event");
+      return;
+    }
+
+  INFRUN_SCOPED_DEBUG_ENTER_EXIT;
+
+  /* We stopped all threads (other than the vforking thread) of the inferior in
+     follow_fork and kept them stopped until now.  It should therefore not be
+     possible for another thread to have reported a vfork during that window.
+     If THREAD_WAITING_FOR_VFORK_DONE is set, it has to be the same thread whose
+     vfork-done we are handling right now.  */
+  gdb_assert (event_thread->inf->thread_waiting_for_vfork_done == event_thread);
+
+  event_thread->inf->thread_waiting_for_vfork_done = nullptr;
+  event_thread->inf->pspace->breakpoints_not_allowed = 0;
+
+  /* On non-stop targets, we stopped all the inferior's threads in follow_fork,
+     resume them now.  On all-stop targets, everything that needs to be resumed
+     will be when we resume the event thread.  */
+  if (target_is_non_stop_p ())
+    {
+      /* restart_threads and start_step_over may change the current thread, make
+	 sure we leave the event thread as the current thread.  */
+      scoped_restore_current_thread restore_thread;
+
+      insert_breakpoints ();
+      restart_threads (event_thread, event_thread->inf);
+      start_step_over ();
+    }
+}
+
 /* Enum strings for "set|show follow-exec-mode".  */
 
 static const char follow_exec_mode_new[] = "new";
@@ -1908,6 +1967,16 @@ start_step_over (void)
 	  continue;
 	}
 
+      if (tp->inf->thread_waiting_for_vfork_done)
+	{
+	  /* When we stop all threads, handling a vfork, any thread in the step
+	     over chain remains there.  A user could also try to continue a
+	     thread stopped at a breakpoint while another thread is waiting for
+	     a vfork-done event.  In any case, we don't want to start a step
+	     over right now.  */
+	  continue;
+	}
+
       /* Remove thread from the THREADS_TO_STEP chain.  If anything goes wrong
 	 while we try to prepare the displaced step, we don't add it back to
 	 the global step over chain.  This is to avoid a thread staying in the
@@ -2143,8 +2212,41 @@ internal_resume_ptid (int user_step)
      return a wildcard ptid.  */
   if (target_is_non_stop_p ())
     return inferior_ptid;
-  else
-    return user_visible_resume_ptid (user_step);
+
+  /* The rest of the function assumes non-stop==off and
+     target-non-stop==off.
+
+     If a thread is waiting for a vfork-done event, it means breakpoints are out
+     for this inferior (well, program space in fact).  We don't want to resume
+     any thread other than the one waiting for vfork done, otherwise these other
+     threads could miss breakpoints.  So if a thread in the resumption set is
+     waiting for a vfork-done event, resume only that thread.
+
+     The resumption set width depends on whether schedule-multiple is on or off.
+
+     Note that if the target_resume interface was more flexible, we could be
+     smarter here when schedule-multiple is on.  For example, imagine 3
+     inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2).  Threads
+     2.1 and 3.2 are both waiting for a vfork-done event.  Then we could ask the
+     target(s) to resume:
+
+      - All threads of inferior 1
+      - Thread 2.1
+      - Thread 3.2
+
+     Since we don't have that flexibility (we can only pass one ptid), just
+     resume the first thread waiting for a vfork-done event we find (e.g. thread
+     2.1).  */
+  if (sched_multi)
+    {
+      for (inferior *inf : all_non_exited_inferiors ())
+	if (inf->thread_waiting_for_vfork_done != nullptr)
+	  return inf->thread_waiting_for_vfork_done->ptid;
+    }
+  else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
+    return current_inferior ()->thread_waiting_for_vfork_done->ptid;
+
+  return user_visible_resume_ptid (user_step);
 }
 
 /* Wrapper for target_resume, that handles infrun-specific
@@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
 		continue;
 	      }
 
+	    /* If a thread of that inferior is waiting for a vfork-done
+	       (for a detached vfork child to exec or exit), breakpoints are
+	       removed.  We must not resume any thread of that inferior, other
+	       than the one waiting for the vfork-done.  */
+	    if (tp->inf->thread_waiting_for_vfork_done != nullptr
+		&& tp != tp->inf->thread_waiting_for_vfork_done)
+	      {
+		infrun_debug_printf ("[%s] another thread of this inferior is "
+				     "waiting for vfork-done",
+				     tp->ptid.to_string ().c_str ());
+		continue;
+	      }
+
 	    infrun_debug_printf ("resuming %s",
 				 tp->ptid.to_string ().c_str ());
 
@@ -3268,7 +3383,12 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
 	      error (_("Command aborted."));
 	  }
       }
-    else if (!cur_thr->resumed () && !thread_is_in_step_over_chain (cur_thr))
+    else if (!cur_thr->resumed ()
+	     && !thread_is_in_step_over_chain (cur_thr)
+	     /* In non-stop, forbid resume a thread if some other thread of
+		that inferior is waiting for a vfork-done event (this means
+		breakpoints are out for this inferior).  */
+	     && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done))
       {
 	/* The thread wasn't started, and isn't queued, run it now.  */
 	reset_ecs (ecs, cur_thr);
@@ -3758,8 +3878,6 @@ struct wait_one_event
 };
 
 static bool handle_one (const wait_one_event &event);
-static void restart_threads (struct thread_info *event_thread,
-			     inferior *inf = nullptr);
 
 /* Prepare and stabilize the inferior for detaching it.  E.g.,
    detaching while a thread is displaced stepping is a recipe for
@@ -5640,8 +5758,8 @@ handle_inferior_event (struct execution_control_state *ecs)
 
       context_switch (ecs);
 
-      current_inferior ()->thread_waiting_for_vfork_done = nullptr;
-      current_inferior ()->pspace->breakpoints_not_allowed = 0;
+      handle_vfork_done (ecs->event_thread);
+      gdb_assert (inferior_thread () == ecs->event_thread);
 
       if (handle_stop_requested (ecs))
 	return;
diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
new file mode 100644
index 000000000000..cd3dfcc1e653
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
@@ -0,0 +1,74 @@
+#include <assert.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/wait.h>
+
+// Start with:
+//
+//   ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here"
+//
+// Then do "continue".
+//
+// The main thread will likely cross should_break_here while we are handling
+// the vfork and breakpoints are removed, therefore missing the breakpoint.
+
+static volatile int release_vfork = 0;
+static volatile int release_main = 0;
+
+static void *vforker(void *arg)
+{
+  while (!release_vfork);
+
+  pid_t pid = vfork ();
+  if (pid == 0) {
+    /* A vfork child is not supposed to mess with the state of the program,
+       but it is helpful for the purpose of this test.  */
+    release_main = 1;
+    _exit(7);
+  }
+
+  int stat;
+  int ret = waitpid (pid, &stat, 0);
+  assert (ret == pid);
+  assert (WIFEXITED (stat));
+  assert (WEXITSTATUS (stat) == 7);
+
+  return NULL;
+}
+
+static void should_break_here(void) {}
+
+int main()
+{
+
+  pthread_t thread;
+  int ret = pthread_create(&thread, NULL, vforker, NULL);
+  assert(ret == 0);
+
+  /* We break here first, while the thread is stuck on `!release_fork`.  */
+  release_vfork = 1;
+
+  /* We set a breakpoint on should_break_here.
+
+     We then set "release_fork" from the debugger and continue.  The main
+     thread hangs on `!release_main` while the non-main thread vforks.  During
+     the window of time where the two processes have a shared address space
+     (after vfork, before _exit), GDB removes the breakpoints from the address
+     space.  During that window, only the vfork-ing thread (the non-main
+     thread) is frozen by the kernel.  The main thread is free to execute.  The
+     child process sets `release_main`, releasing the main thread. A buggy GDB
+     would let the main thread execute during that window, leading to the
+     breakpoint on should_break_here being missed.  A fixed GDB does not resume
+     the threads of the vforking process other than the vforking thread.  When
+     the vfork child exits, the fixed GDB resumes the main thread, after
+     breakpoints are reinserted, so the breakpoint is not missed.  */
+
+  while (!release_main);
+
+  should_break_here();
+
+  pthread_join (thread, NULL);
+
+  return 6;
+}
diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.exp b/gdb/testsuite/gdb.threads/vfork-multi-thread.exp
new file mode 100644
index 000000000000..d405411be012
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.exp
@@ -0,0 +1,96 @@
+# Copyright 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that a multi-threaded program doing a vfork doesn't miss breakpoints.
+#
+# When a program vforks, its address space is shared with the parent.  When we
+# detach a vfork child, we must keep breakpoints out of that shared address space
+# until the child either exits or execs, so that the child does not hit a
+# breakpoint while out of GDB's control.  During that time, threads from
+# the parent must be held stopped, otherwise they could miss breakpoints.
+#
+# The thread that did the vfork is suspended by the kernel, so it's not a
+# concern.  The other threads need to be manually stopped by GDB and resumed
+# once the vfork critical region is done.
+#
+# This test spawns one thread that calls vfork.  Meanwhile, the main thread
+# crosses a breakpoint.  A buggy GDB would let the main thread run while
+# breakpoints are removed, so the main thread would miss the breakpoint and run
+# until exit.
+
+standard_testfile
+
+if { [build_executable "failed to prepare" ${testfile} ${srcfile} {debug pthreads}] } {
+    return
+}
+
+set any "\[^\r\n\]*"
+
+# A bunch of util procedures to continue an inferior to an expected point.
+
+proc continue_to_parent_breakpoint {} {
+    gdb_test "continue" \
+	"hit Breakpoint .* should_break_here .*" \
+	"continue parent to breakpoint"
+}
+
+proc continue_to_parent_end {} {
+    gdb_test "continue" "Inferior 1.*exited with code 06.*" \
+	"continue parent to end"
+}
+
+# Run the test with the given GDB settings.
+
+proc do_test { target-non-stop non-stop follow-fork-mode detach-on-fork schedule-multiple } {
+    save_vars { ::GDBFLAGS } {
+	append ::GDBFLAGS " -ex \"maintenance set target-non-stop ${target-non-stop}\""
+	append ::GDBFLAGS " -ex \"set non-stop ${non-stop}\""
+	clean_restart ${::binfile}
+    }
+
+    gdb_test_no_output "set follow-fork-mode ${follow-fork-mode}"
+    gdb_test_no_output "set detach-on-fork ${detach-on-fork}"
+    gdb_test_no_output "set schedule-multiple ${schedule-multiple}"
+
+    # The message about thread 2 of inferior 1 exiting happens at a somewhat
+    # unpredictable moment, it's simpler to silence it than to try to match it.
+    gdb_test_no_output "set print thread-events off"
+
+    if { ![runto_main] } {
+	return
+    }
+
+    # The main thread is expected to hit this breakpoint.
+    gdb_test "break should_break_here" "Breakpoint $::decimal at .*"
+
+    continue_to_parent_breakpoint
+    continue_to_parent_end
+}
+
+# We only test with follow-fork-mode=parent and detach-on-fork=on at the
+# moment, but the loops below are written to make it easy to add other values
+# on these axes in the future.
+
+foreach_with_prefix target-non-stop {auto on off} {
+    foreach_with_prefix non-stop {off on} {
+	foreach_with_prefix follow-fork-mode {parent} {
+	    foreach_with_prefix detach-on-fork {on} {
+		foreach_with_prefix schedule-multiple {off on} {
+		    do_test ${target-non-stop} ${non-stop} ${follow-fork-mode} ${detach-on-fork} ${schedule-multiple}
+		}
+	    }
+	}
+    }
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/8] gdbserver: report correct status in thread stop race condition
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
                   ` (5 preceding siblings ...)
  2022-01-17 16:27 ` [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:21   ` Pedro Alves
  2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
  7 siblings, 1 reply; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

From: Simon Marchi <simon.marchi@efficios.com>

The test introduced by the following patch would sometimes fail in this
configuration:

    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=auto: i=14: next to for loop

The test has multiple threads constantly forking or vforking while the
main thread keep doing "next"s.

(After writing the commit message, I realized this also fixes a similar
failure in gdb.threads/forking-threads-plus-breakpoint.exp with the
native-gdbserver and native-extended-gdbserver boards.)

As stop_all_threads is called, because the main thread finished its
"next", it inevitably happens at some point that we ask the remote
target to stop a thread and wait() reports that this thread stopped with
a fork or vfork event, instead of the SIGSTOP we sent to try to stop it.

While running this test, I attached to GDBserver and stopped at
linux-low.cc:3626.  We can see that the status pulled from the kernel
for 2742805 is indeed a vfork event:

    (gdb) p/x w
    $3 = 0x2057f
    (gdb) p WIFSTOPPED(w)
    $4 = true
    (gdb) p WSTOPSIG(w)
    $5 = 5
    (gdb) p/x (w >> 8) & (PTRACE_EVENT_VFORK << 8)
    $6 = 0x200

However, the statement at line 3626 overrides that:

    ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));

OURSTATUS becomes "stopped by a SIGTRAP".  The information about the
fork or vfork is lost.

It's then all downhill from there, stop_all_threads eventually asks for
a thread list update.  That thread list includes the child of that
forgotten fork or vfork, the remote target goes "oh cool, a new process,
let's attach to it!", when in fact that vfork child's destiny was to be
detached.

My reverse-engineered understanding of the code around there is that the
if/else between lines 3562 and 3583 (in the original code) makes sure
OURSTATUS is always initialized (not "ignore").  Either the details are
already in event_child->waitstatus (in the case of fork/vfork, for
example), in which case we just copy event_child->waitstatus to
ourstatus.  Or, if the event is a plain "stopped by a signal" or a
syscall event, OURSTATUS is set to "stopped", but without a signal
number.  Lines 3601 to 3629 (in the original code) serve to fill in that
last bit of information.

The problem is that when `w` holds the vfork status, the code wrongfully
takes this branch, because WSTOPSIG(w) returns SIGTRAP:

  else if (current_thread->last_resume_kind == resume_stop
       && WSTOPSIG (w) != SIGSTOP)

The intent of this branch is, for example, when we sent SIGSTOP to try
to stop a thread, but wait() reports that it stopped with another signal
(that it must have received from somewhere else simultaneously), say
SIGWINCH.  In that case, we want to report the SIGWINCH.  But in our
fork/vfork case, we don't want to take this branch, as the thread didn't
really stop because it received a signal.  For the non "stopped by a
signal" and non "syscall signal" cases, we would ideally skip over all
that snippet that fills in the signal or syscall number.

The fix I propose is to move this snipppet of the else branch of the
if/else above.  In addition to moving the code, the last two "else if"
branches:

  else if (current_thread->last_resume_kind == resume_stop
	   && WSTOPSIG (w) != SIGSTOP)
    {
      /* A thread that has been requested to stop by GDB with vCont;t,
	 but, it stopped for other reasons.  */
      ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
    }
  else if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
    ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));

are changed into a single else:

  else
    ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));

This is the default path we take if:

 - W is not a syscall status
 - W does not represent a SIGSTOP that have sent to stop the thread and
   therefore want to suppress it

Change-Id: If2dc1f0537a549c293f7fa3c53efd00e3e194e79
---
 gdbserver/linux-low.cc | 60 ++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index d7c7336ff1f5..b86dff16f079 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -3559,6 +3559,9 @@ linux_process_target::wait_1 (ptid_t ptid, target_waitstatus *ourstatus,
 	unstop_all_lwps (1, event_child);
     }
 
+  /* At this point, we haven't set OURSTATUS.  This is where we do it.  */
+  gdb_assert (ourstatus->kind () == TARGET_WAITKIND_IGNORE);
+
   if (event_child->waitstatus.kind () != TARGET_WAITKIND_IGNORE)
     {
       /* If the reported event is an exit, fork, vfork or exec, let
@@ -3578,8 +3581,31 @@ linux_process_target::wait_1 (ptid_t ptid, target_waitstatus *ourstatus,
     }
   else
     {
-      /* The actual stop signal is overwritten below.  */
-      ourstatus->set_stopped (GDB_SIGNAL_0);
+      /* The LWP stopped due to a plain signal or a syscall signal.  Either way,
+         event_chid->waitstatus wasn't filled in with the details, so look at
+	 the wait status W.  */
+      if (WSTOPSIG (w) == SYSCALL_SIGTRAP)
+	{
+	  int syscall_number;
+
+	  get_syscall_trapinfo (event_child, &syscall_number);
+	  if (event_child->syscall_state == TARGET_WAITKIND_SYSCALL_ENTRY)
+	    ourstatus->set_syscall_entry (syscall_number);
+	  else if (event_child->syscall_state == TARGET_WAITKIND_SYSCALL_RETURN)
+	    ourstatus->set_syscall_return (syscall_number);
+	  else
+	    gdb_assert_not_reached ("unexpected syscall state");
+	}
+      else if (current_thread->last_resume_kind == resume_stop
+	       && WSTOPSIG (w) == SIGSTOP)
+	{
+	  /* A thread that has been requested to stop by GDB with vCont;t,
+	     and it stopped cleanly, so report as SIG0.  The use of
+	     SIGSTOP is an implementation detail.  */
+	  ourstatus->set_stopped (GDB_SIGNAL_0);
+	}
+      else
+	ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
     }
 
   /* Now that we've selected our final event LWP, un-adjust its PC if
@@ -3598,36 +3624,6 @@ linux_process_target::wait_1 (ptid_t ptid, target_waitstatus *ourstatus,
 	}
     }
 
-  if (WSTOPSIG (w) == SYSCALL_SIGTRAP)
-    {
-      int syscall_number;
-
-      get_syscall_trapinfo (event_child, &syscall_number);
-      if (event_child->syscall_state == TARGET_WAITKIND_SYSCALL_ENTRY)
-	ourstatus->set_syscall_entry (syscall_number);
-      else if (event_child->syscall_state == TARGET_WAITKIND_SYSCALL_RETURN)
-	ourstatus->set_syscall_return (syscall_number);
-      else
-	gdb_assert_not_reached ("unexpected syscall state");
-    }
-  else if (current_thread->last_resume_kind == resume_stop
-	   && WSTOPSIG (w) == SIGSTOP)
-    {
-      /* A thread that has been requested to stop by GDB with vCont;t,
-	 and it stopped cleanly, so report as SIG0.  The use of
-	 SIGSTOP is an implementation detail.  */
-      ourstatus->set_stopped (GDB_SIGNAL_0);
-    }
-  else if (current_thread->last_resume_kind == resume_stop
-	   && WSTOPSIG (w) != SIGSTOP)
-    {
-      /* A thread that has been requested to stop by GDB with vCont;t,
-	 but, it stopped for other reasons.  */
-      ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
-    }
-  else if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
-    ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
-
   gdb_assert (step_over_bkpt == null_ptid);
 
   if (debug_threads)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork
  2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
                   ` (6 preceding siblings ...)
  2022-01-17 16:27 ` [PATCH 7/8] gdbserver: report correct status in thread stop race condition Simon Marchi
@ 2022-01-17 16:27 ` Simon Marchi
  2022-03-31 18:22   ` Pedro Alves
  2022-03-31 18:28   ` Pedro Alves
  7 siblings, 2 replies; 23+ messages in thread
From: Simon Marchi @ 2022-01-17 16:27 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

From: Simon Marchi <simon.marchi@efficios.com>

The test introduced by this patch would fail in this configuration, with
the native-gdbserver or native-extended-gdbserver boards:

    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=2: next to for loop

The problem is that the step operation is forgotten when handling the
fork/vfork.  With "debug infrun" and "debug remote", it looks like this
(some lines omitted for brevity).  We do the next:

    [infrun] proceed: enter
      [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
      [infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154304.0] at 0x5555555553bf
      [infrun] do_target_resume: resume_ptid=4154304.0.0, step=1, sig=GDB_SIGNAL_0
      [remote] Sending packet: $vCont;r5555555553bf,5555555553c4:p3f63c0.3f63c0;c:p3f63c0.-1#cd
    [infrun] proceed: exit

We then handle a fork event:

    [infrun] fetch_inferior_event: enter
      [remote] wait: enter
        [remote] Packet received: T05fork:p3f63ee.3f63ee;06:0100000000000000;07:b08e59f6ff7f0000;10:bf60e8f7ff7f0000;thread:p3f63c0.3f63c6;core:17;
      [remote] wait: exit
      [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
      [infrun] print_target_wait_results:   4154304.4154310.0 [Thread 4154304.4154310],
      [infrun] print_target_wait_results:   status->kind = FORKED, child_ptid = 4154350.4154350.0
      [infrun] handle_inferior_event: status->kind = FORKED, child_ptid = 4154350.4154350.0
      [remote] Sending packet: $D;3f63ee#4b
      [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154310.0] at 0x7ffff7e860bf
      [infrun] do_target_resume: resume_ptid=4154304.0.0, step=0, sig=GDB_SIGNAL_0
      [remote] Sending packet: $vCont;c:p3f63c0.-1#73
    [infrun] fetch_inferior_event: exit

In the first snippet, we resume the stepping thread with the range-stepping (r)
vCont command.  But after handling the fork (detaching the fork child), we
resumed the whole process freely.  The stepping thread, which was paused by
GDBserver while reporting the fork event, was therefore resumed freely, instead
of confined to the addresses of the stepped line.  Note that since this
is a "next", it could be that we have entered a function, installed a
step-resume breakpoint, and it's ok to continue freely the stepping
thread, but that's not the case here.  The two snippets shown above were
next to each other in the logs.

For the fork case, we can resume stepping right after handling the
event.

However, for the vfork case, where we are waiting for the
external child process to exec or exit, we only resume the thread that
called vfork, and keep the others stopped (see patch "gdb: fix handling of
vfork by multi-threaded program" prior in this series).  So we can't
resume the stepping thread right now.  Instead, do it after handling the
vfork-done event.

Change-Id: I92539c970397ce880110e039fe92b87480f816bd
---
 gdb/infrun.c                                  |  21 +++-
 .../gdb.threads/next-fork-other-thread.c      |  86 +++++++++++++
 .../gdb.threads/next-fork-other-thread.exp    | 116 ++++++++++++++++++
 3 files changed, 219 insertions(+), 4 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/next-fork-other-thread.c
 create mode 100644 gdb/testsuite/gdb.threads/next-fork-other-thread.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 75252be22038..66708bc764c9 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5744,7 +5744,16 @@ handle_inferior_event (struct execution_control_state *ecs)
 	  ecs->ptid = inferior_ptid;
 
 	  if (should_resume)
-	    keep_going (ecs);
+	    {
+	      /* Never call switch_back_to_stepped_thread if we are waiting for
+	         vfork-done (waiting for an external vfork child to exec or
+		 exit).  We will resume only the vforking thread for the purpose
+		 of collecting the vfork-done event, and we will restart any
+		 step once the critical shared address space window is done.  */
+	      if (parent->inf->thread_waiting_for_vfork_done != nullptr
+		  || !switch_back_to_stepped_thread (ecs))
+		keep_going (ecs);
+	    }
 	  else
 	    stop_waiting (ecs);
 	  return;
@@ -5764,9 +5773,13 @@ handle_inferior_event (struct execution_control_state *ecs)
       if (handle_stop_requested (ecs))
 	return;
 
-      /* This also takes care of reinserting breakpoints in the
-	 previously locked inferior.  */
-      keep_going (ecs);
+      if (!switch_back_to_stepped_thread (ecs))
+	{
+	  gdb_assert (inferior_thread () == ecs->event_thread);
+	  /* This also takes care of reinserting breakpoints in the
+	     previously locked inferior.  */
+	  keep_going (ecs);
+	}
       return;
 
     case TARGET_WAITKIND_EXECD:
diff --git a/gdb/testsuite/gdb.threads/next-fork-other-thread.c b/gdb/testsuite/gdb.threads/next-fork-other-thread.c
new file mode 100644
index 000000000000..6679f92c70c5
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/next-fork-other-thread.c
@@ -0,0 +1,86 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <assert.h>
+#include <limits.h>
+
+/* Number of threads doing forks.  */
+#define N_FORKERS 4
+
+static void *
+forker (void *arg)
+{
+  for (;;)
+    {
+      pid_t pid = FORK_FUNC ();
+
+      if (pid == 0)
+	_exit(11);
+
+      assert (pid > 0);
+
+      /* Wait for children to exit.  */
+      int ret;
+      int stat;
+      do {
+        ret = waitpid (pid, &stat, 0);
+      } while (ret == EINTR);
+
+      assert (ret == pid);
+      assert (WIFEXITED (stat));
+      assert (WEXITSTATUS (stat) == 11);
+
+      usleep (40 * 1000);
+    }
+
+  return NULL;
+}
+
+static void
+sleep_a_bit (void)
+{
+  usleep (1000 * 50);
+}
+
+int
+main (void)
+{
+  alarm (60);
+
+  pthread_t thread[N_FORKERS];
+  for (int i = 0; i < N_FORKERS; ++i)
+    {
+      int ret = pthread_create (&thread[i], NULL, forker, NULL);
+      assert (ret == 0);
+    }
+
+  for (int i = 0; i < INT_MAX; ++i) /* for loop */
+    {
+      sleep_a_bit ();  /* break here */
+      sleep_a_bit ();  /* other line */
+    }
+
+  for (int i = 0; i < N_FORKERS; ++i)
+    pthread_join (thread[i], NULL);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/next-fork-other-thread.exp b/gdb/testsuite/gdb.threads/next-fork-other-thread.exp
new file mode 100644
index 000000000000..5960bb9ba100
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/next-fork-other-thread.exp
@@ -0,0 +1,116 @@
+# Copyright 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test doing a "next" on a thread during which forks or vforks happen in other
+# threads.
+
+standard_testfile
+
+# Line where to stop the main thread.
+set break_here_line [gdb_get_line_number "break here"]
+
+# Build executables, one for each fork flavor.
+foreach_with_prefix fork_func {fork vfork} {
+    set opts [list debug pthreads additional_flags=-DFORK_FUNC=${fork_func}]
+    if { [build_executable "failed to prepare" \
+	    ${testfile}-${fork_func} ${srcfile} $opts] } {
+	return
+    }
+}
+
+# If testing against GDBserver, consume all it its output.
+
+proc drain_gdbserver_output { } {
+    if { [info exists ::server_spawn_id] } {
+	gdb_test_multiple "" "" {
+	    -i "$::server_spawn_id"
+	    -timeout 0
+	    -re ".+" {
+	      exp_continue
+	    }
+	}
+    }
+}
+
+# Run the test with the given parameters:
+#
+#   - FORK_FUNC: fork flavor, "fork" or "vfork".
+#   - TARGET-NON-STOP: "maintenance set target-non-stop" value, "auto", "on" or
+#     "off".
+#   - NON-STOP: "set non-stop" value, "on" or "off".
+#   - DISPLACED-STEPPING: "set displaced-stepping" value, "auto", "on" or "off".
+
+proc do_test { fork_func target-non-stop non-stop displaced-stepping } {
+    save_vars { ::GDBFLAGS } {
+	append ::GDBFLAGS " -ex \"maintenance set target-non-stop ${target-non-stop}\""
+	append ::GDBFLAGS " -ex \"set non-stop ${non-stop}\""
+	clean_restart ${::binfile}-${fork_func}
+    }
+
+    gdb_test_no_output "set displaced-stepping ${displaced-stepping}"
+
+    if { ![runto_main] } {
+	return
+    }
+
+    # The "Detached after (v)fork" messages get in the way in non-stop, disable
+    # them.
+    gdb_test_no_output "set print inferior-events off"
+
+    # Advance the next-ing thread to the point where we'll execute the nexts.
+    # Leave the breakpoint in: it will force GDB to step over it while next-ing,
+    # which exercises some additional code paths.
+    gdb_test "break $::break_here_line" "Breakpoint $::decimal at $::hex.*"
+    gdb_test "continue" "hit Breakpoint $::decimal, main.*"
+
+    # Next an arbitrary number of times over the lines of the loop.
+    #
+    # It is useful to bump this number to a larger value (e.g. 200) to stress
+    # test more, but it makes the test case run for considerably longer.  If
+    # you increase the number of loops, you might want to adjust the alarm
+    # time in the .c file accordingly.
+    for { set i 0 } { $i < 20 } { incr i } {
+	# If testing against GDBserver, the forking threads cause a lot of
+	# "Detaching from process XYZ" messages to appear.  If we don't consume
+	# that output, GDBserver eventually blocks on a full stderr.  Drain it
+	# once every loop.  It may not be needed for 20 iterations, but it's
+	# needed if you increase to 200 iterations.
+	drain_gdbserver_output
+
+	with_test_prefix "i=$i" {
+	    if { [gdb_test "next" "other line.*" "next to other line"] != 0 } {
+		return
+	    }
+
+	    if { [gdb_test "next" "for loop.*" "next to for loop"] != 0 } {
+		return
+	    }
+
+	    if { [gdb_test "next" "break here.*" "next to break here"] != 0} {
+		return
+	    }
+	}
+    }
+}
+
+foreach_with_prefix fork_func {fork vfork} {
+    foreach_with_prefix target-non-stop {auto on off} {
+	foreach_with_prefix non-stop {off on} {
+	    foreach_with_prefix displaced-stepping {auto on off} {
+		do_test ${fork_func} ${target-non-stop} ${non-stop} ${displaced-stepping}
+	    }
+	}
+    }
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads
  2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
@ 2022-03-31 15:05   ` Pedro Alves
  2022-03-31 15:35     ` Simon Marchi
  0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 15:05 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Simon Marchi

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
> 
> Add a "reason" parameter, only used to show in debug messages what is
> the reason for stopping all threads.  This helped me understand the
> debug logs while adding some new uses of stop_all_threads, so I am
> proposing to merge it.

OK.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads
  2022-03-31 15:05   ` Pedro Alves
@ 2022-03-31 15:35     ` Simon Marchi
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Marchi @ 2022-03-31 15:35 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches; +Cc: Simon Marchi



On 2022-03-31 11:05, Pedro Alves wrote:
> On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
>> From: Simon Marchi <simon.marchi@efficios.com>
>>
>> Add a "reason" parameter, only used to show in debug messages what is
>> the reason for stopping all threads.  This helped me understand the
>> debug logs while adding some new uses of stop_all_threads, so I am
>> proposing to merge it.
> 
> OK.

Pushed on its own, given it improves the debug messages in general.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait
  2022-01-17 16:27 ` [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait Simon Marchi
@ 2022-03-31 16:12   ` Pedro Alves
  2022-03-31 17:06     ` Simon Marchi
  0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 16:12 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:

> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index b926fb5eba9e..7de4da03a34f 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2096,20 +2096,11 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
>  
>    if (event == PTRACE_EVENT_VFORK_DONE)
>      {
> -      if (current_inferior ()->waiting_for_vfork_done)
> -	{
> -	  linux_nat_debug_printf
> -	    ("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld: stopping",
> -	     lp->ptid.lwp ());
> -
> -	  ourstatus->set_vfork_done ();
> -	  return 0;
> -	}
> -
>        linux_nat_debug_printf
> -	("Got PTRACE_EVENT_VFORK_DONE from LWP %ld: ignoring", lp->ptid.lwp ());
> -
> -      return 1;
> +	("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld",
> +	 lp->ptid.lwp ());

Please drop the "expected" from the log, as we're not checking 
waiting_for_vfork_done anymore.

> +	  /* Child */
> +	  _exit(12);

Space before parens.

Otherwise OK.  Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait
  2022-03-31 16:12   ` Pedro Alves
@ 2022-03-31 17:06     ` Simon Marchi
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Marchi @ 2022-03-31 17:06 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2022-03-31 12:12, Pedro Alves wrote:
> On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> 
>> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
>> index b926fb5eba9e..7de4da03a34f 100644
>> --- a/gdb/linux-nat.c
>> +++ b/gdb/linux-nat.c
>> @@ -2096,20 +2096,11 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
>>  
>>    if (event == PTRACE_EVENT_VFORK_DONE)
>>      {
>> -      if (current_inferior ()->waiting_for_vfork_done)
>> -	{
>> -	  linux_nat_debug_printf
>> -	    ("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld: stopping",
>> -	     lp->ptid.lwp ());
>> -
>> -	  ourstatus->set_vfork_done ();
>> -	  return 0;
>> -	}
>> -
>>        linux_nat_debug_printf
>> -	("Got PTRACE_EVENT_VFORK_DONE from LWP %ld: ignoring", lp->ptid.lwp ());
>> -
>> -      return 1;
>> +	("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld",
>> +	 lp->ptid.lwp ());
> 
> Please drop the "expected" from the log, as we're not checking 
> waiting_for_vfork_done anymore.

Done.

> 
>> +	  /* Child */
>> +	  _exit(12);
> 
> Space before parens.

Done.

> Otherwise OK.  Thanks.

Thanks, I'll push this patch on its own, since it's self-contained.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done
  2022-01-17 16:27 ` [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done Simon Marchi
@ 2022-03-31 18:17   ` Pedro Alves
  2022-04-01 14:25     ` Simon Marchi
  0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:17 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> The inferior::waiting_for_vfork_done flag indicates that some thread in
> that inferior is waiting for a vfork-done event.  Subsequent patches
> will need to know which thread is waiting for that event.
> 
> I think there is a latent buglet in that waiting_for_vfork_done is
> currently not reset on inferior exec or exit.  I could imagine that if a
> thread in the parent process calls exec or exit while another thread of
> the parent process is waiting for its vfork child to exec or exit, we
> could end up with inferior::waiting_for_vfork_done without a thread
> actually waiting for a vfork-done event anymore.  And since that flag is
> checked in resume_1, things could misbehave there.
> 
> Since the new field points to a thread_info object, and those are
> destroyed on exec or exit, it could be worse now since we could try to
> access freed memory, if thread_waiting_for_vfork_done were to point to a
> stale thread_info.  To avoid this, clear the field in
> infrun_inferior_exit and infrun_inferior_execd.

This is OK, please mention explicitly in the body of the commit log that this
is replacing the boolean with a thread_info pointer.  Not having that
actually gave me pause.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads
  2022-01-17 16:27 ` [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads Simon Marchi
@ 2022-03-31 18:17   ` Pedro Alves
  0 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:17 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> A following patch will want to stop all threads of a given inferior (as
> opposed to all threads of all inferiors) while handling a vfork, and
> restart them after.  To help with this, add inferior parameters to
> stop_all_threads and restart_threads.  This is done as a separate patch
> to make sure this doesn't cause regressions on its own, and to keep the
> following patches more concise.
> 
> No visible changes are expected here, since all calls sites pass
> nullptr, which should keep the existing behavior.
> 

OK.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume
  2022-01-17 16:27 ` [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume Simon Marchi
@ 2022-03-31 18:18   ` Pedro Alves
  0 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:18 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> This helped me, it shows which ptid we actually call target_resume with.
> 

OK.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)
  2022-01-17 16:27 ` [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Simon Marchi
@ 2022-03-31 18:21   ` Pedro Alves
  2022-04-01 17:28     ` Simon Marchi
  0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:21 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Simon Marchi

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
> 
> There is a problem with how GDB handles a vfork happening in a
> multi-threaded program.  This problem was reported to me by somebody not
> using vfork directly, but using system(3) in a multi-threaded program,
> which may be implemented using vfork.
> 
> This patch only deals about the follow-fork-mode=parent,
> detach-on-fork=on case, because it would be too much to chew at once to
> fix the bugs in the other cases as well (I tried).

Thinking through this, one concern I had is that if the vfork child gets stuck for some reason
before execing or exiting (hey we're a debugger, bugs happen!), then pausing all threads makes
it so that you can end up stuck in gdb and can't get back the prompt (if the child ignores
SIGINT, otherwise it just dies), or if you continue in the background, you
can't interrupt the parent threads.

E.g., with your series, and:

diff --git c/gdb/testsuite/gdb.threads/vfork-multi-thread.c w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
index cd3dfcc1e65..6178de91dca 100644
--- c/gdb/testsuite/gdb.threads/vfork-multi-thread.c
+++ w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
@@ -25,6 +25,8 @@ static void *vforker(void *arg)
     /* A vfork child is not supposed to mess with the state of the program,
        but it is helpful for the purpose of this test.  */
     release_main = 1;
+    while (1)
+      sleep (1);
     _exit(7);
   }
 
@@ -41,7 +43,7 @@ static void should_break_here(void) {}
 
 int main()
 {
-
+  signal (SIGINT, SIG_IGN);
   pthread_t thread;
   int ret = pthread_create(&thread, NULL, vforker, NULL);
   assert(ret == 0);
(END)
 

We get:

(gdb) r
Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d8a700 (LWP 1713229)]
[Detaching after vfork from child process 1713230]
^C^C^C^C^C^C
(hung forever)

Or:

$ gdb -ex "set non-stop on" ...
(gdb) r&
Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
(gdb) [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d8a700 (LWP 1345571)]
[Detaching after vfork from child process 1345572]
info threads 
  Id   Target Id                                             Frame 
* 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
  2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
(gdb) interrupt 
*nothing*
(gdb) interrupt 
*nothing*
(gdb) info threads 
  Id   Target Id                                             Frame 
* 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
  2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
(gdb) 


And then you can't quit gdb:

(gdb) q
A debugging session is active.

        Inferior 1 [process 1345567] will be killed.

Quit anyway? (y or n) y

^C^C^C^C
(hung forever)


I guess the ^C issue will be fixed by my series reworking ctrl-c, leaving
gdb in control of the terminal at all times..


Bah, I guess that this might be a meh/schrug, since if the parent is
single-threaded, then you can't interrupt it either.  Though at least GDB
doesn't get you get into the stuck state.


I wonder whether a better solution would be something completely different.
Like, say, make it so that gdb always remains attached to the child until it
execs/exits, in which case we would no longer need to remove breakpoints from the
parent/child, and thus we could let all the parent's threads run as well.

We actually do that for follow-fork parent / detach-on-fork-on and
vfork -- see inferior::pending_detach.


Anyhow, onward.  With your currently solution at least the common scenario
works, and we have a new testcase that helps whatever redesign in future.


Despite these concerns, I'm actually happy to get this into the tree.  I think
it's still better than what we have today.

> +
> +  /* The rest of the function assumes non-stop==off and
> +     target-non-stop==off.
> +
> +     If a thread is waiting for a vfork-done event, it means breakpoints are out
> +     for this inferior (well, program space in fact).  We don't want to resume
> +     any thread other than the one waiting for vfork done, otherwise these other
> +     threads could miss breakpoints.  So if a thread in the resumption set is
> +     waiting for a vfork-done event, resume only that thread.
> +
> +     The resumption set width depends on whether schedule-multiple is on or off.
> +
> +     Note that if the target_resume interface was more flexible, we could be
> +     smarter here when schedule-multiple is on.  For example, imagine 3
> +     inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2).  Threads
> +     2.1 and 3.2 are both waiting for a vfork-done event.  Then we could ask the
> +     target(s) to resume:
> +
> +      - All threads of inferior 1
> +      - Thread 2.1
> +      - Thread 3.2
> +
> +     Since we don't have that flexibility (we can only pass one ptid), just
> +     resume the first thread waiting for a vfork-done event we find (e.g. thread
> +     2.1).  */

We actually have the flexibility, but we don't use it.  I mean, we could use
the commit_resumed mechanism in all-stop as well, which would let us prepare such
a resume with multiple target_resume calls and final commit.  For a rainy day.


> +  if (sched_multi)
> +    {
> +      for (inferior *inf : all_non_exited_inferiors ())
> +	if (inf->thread_waiting_for_vfork_done != nullptr)
> +	  return inf->thread_waiting_for_vfork_done->ptid;
> +    }
> +  else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
> +    return current_inferior ()->thread_waiting_for_vfork_done->ptid;
> +
> +  return user_visible_resume_ptid (user_step);
>  }
>  
>  /* Wrapper for target_resume, that handles infrun-specific
> @@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
>  		continue;


> diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> new file mode 100644
> index 000000000000..cd3dfcc1e653
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> @@ -0,0 +1,74 @@
> +#include <assert.h>

Missing copyright header.

> +#include <pthread.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/wait.h>
> +
> +// Start with:
> +//
> +//   ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here"
> +//
> +// Then do "continue".
> +//
> +// The main thread will likely cross should_break_here while we are handling
> +// the vfork and breakpoints are removed, therefore missing the breakpoint.

The whole file needs a pass to adjust it to GNU conventions/formatting.

> +
> +static volatile int release_vfork = 0;
> +static volatile int release_main = 0;
> +
> +static void *vforker(void *arg)
> +{
> +  while (!release_vfork);

I'd add some usleep(1) here, to avoid hogging the cpu.

> +
> +  pid_t pid = vfork ();
> +  if (pid == 0) {
> +    /* A vfork child is not supposed to mess with the state of the program,
> +       but it is helpful for the purpose of this test.  */
> +    release_main = 1;
> +    _exit(7);
> +  }
> +
> +  int stat;
> +  int ret = waitpid (pid, &stat, 0);
> +  assert (ret == pid);
> +  assert (WIFEXITED (stat));
> +  assert (WEXITSTATUS (stat) == 7);
> +
> +  return NULL;
> +}
> +
> +static void should_break_here(void) {}
> +
> +int main()
> +{
> +
> +  pthread_t thread;
> +  int ret = pthread_create(&thread, NULL, vforker, NULL);
> +  assert(ret == 0);
> +
> +  /* We break here first, while the thread is stuck on `!release_fork`.  */
> +  release_vfork = 1;
> +
> +  /* We set a breakpoint on should_break_here.
> +
> +     We then set "release_fork" from the debugger and continue.  The main
> +     thread hangs on `!release_main` while the non-main thread vforks.  During
> +     the window of time where the two processes have a shared address space
> +     (after vfork, before _exit), GDB removes the breakpoints from the address
> +     space.  During that window, only the vfork-ing thread (the non-main
> +     thread) is frozen by the kernel.  The main thread is free to execute.  The
> +     child process sets `release_main`, releasing the main thread. A buggy GDB
> +     would let the main thread execute during that window, leading to the
> +     breakpoint on should_break_here being missed.  A fixed GDB does not resume
> +     the threads of the vforking process other than the vforking thread.  When
> +     the vfork child exits, the fixed GDB resumes the main thread, after
> +     breakpoints are reinserted, so the breakpoint is not missed.  */
> +
> +  while (!release_main);

Ditto, usleep.

Pedro Alves

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] gdbserver: report correct status in thread stop race condition
  2022-01-17 16:27 ` [PATCH 7/8] gdbserver: report correct status in thread stop race condition Simon Marchi
@ 2022-03-31 18:21   ` Pedro Alves
  0 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:21 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Simon Marchi

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
> 
> The test introduced by the following patch would sometimes fail in this
> configuration:
> 
>     FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=vfork: target-non-stop=on: non-stop=off: displaced-stepping=auto: i=14: next to for loop
> 
> The test has multiple threads constantly forking or vforking while the
> main thread keep doing "next"s.
> 
> (After writing the commit message, I realized this also fixes a similar
> failure in gdb.threads/forking-threads-plus-breakpoint.exp with the
> native-gdbserver and native-extended-gdbserver boards.)
> 
> As stop_all_threads is called, because the main thread finished its
> "next", it inevitably happens at some point that we ask the remote
> target to stop a thread and wait() reports that this thread stopped with
> a fork or vfork event, instead of the SIGSTOP we sent to try to stop it.
> 
> While running this test, I attached to GDBserver and stopped at
> linux-low.cc:3626.  We can see that the status pulled from the kernel
> for 2742805 is indeed a vfork event:
> 
>     (gdb) p/x w
>     $3 = 0x2057f
>     (gdb) p WIFSTOPPED(w)
>     $4 = true
>     (gdb) p WSTOPSIG(w)
>     $5 = 5
>     (gdb) p/x (w >> 8) & (PTRACE_EVENT_VFORK << 8)
>     $6 = 0x200
> 
> However, the statement at line 3626 overrides that:
> 
>     ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
> 
> OURSTATUS becomes "stopped by a SIGTRAP".  The information about the
> fork or vfork is lost.
> 
> It's then all downhill from there, stop_all_threads eventually asks for
> a thread list update.  That thread list includes the child of that
> forgotten fork or vfork, the remote target goes "oh cool, a new process,
> let's attach to it!", when in fact that vfork child's destiny was to be
> detached.
> 
> My reverse-engineered understanding of the code around there is that the
> if/else between lines 3562 and 3583 (in the original code) makes sure
> OURSTATUS is always initialized (not "ignore").  Either the details are
> already in event_child->waitstatus (in the case of fork/vfork, for
> example), in which case we just copy event_child->waitstatus to
> ourstatus.  Or, if the event is a plain "stopped by a signal" or a
> syscall event, OURSTATUS is set to "stopped", but without a signal
> number.  Lines 3601 to 3629 (in the original code) serve to fill in that
> last bit of information.
> 
> The problem is that when `w` holds the vfork status, the code wrongfully
> takes this branch, because WSTOPSIG(w) returns SIGTRAP:
> 
>   else if (current_thread->last_resume_kind == resume_stop
>        && WSTOPSIG (w) != SIGSTOP)
> 
> The intent of this branch is, for example, when we sent SIGSTOP to try
> to stop a thread, but wait() reports that it stopped with another signal
> (that it must have received from somewhere else simultaneously), say
> SIGWINCH.  In that case, we want to report the SIGWINCH.  But in our
> fork/vfork case, we don't want to take this branch, as the thread didn't
> really stop because it received a signal.  For the non "stopped by a
> signal" and non "syscall signal" cases, we would ideally skip over all
> that snippet that fills in the signal or syscall number.
> 
> The fix I propose is to move this snipppet of the else branch of the
> if/else above.  In addition to moving the code, the last two "else if"
> branches:
> 
>   else if (current_thread->last_resume_kind == resume_stop
> 	   && WSTOPSIG (w) != SIGSTOP)
>     {
>       /* A thread that has been requested to stop by GDB with vCont;t,
> 	 but, it stopped for other reasons.  */
>       ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
>     }
>   else if (ourstatus->kind () == TARGET_WAITKIND_STOPPED)
>     ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
> 
> are changed into a single else:
> 
>   else
>     ourstatus->set_stopped (gdb_signal_from_host (WSTOPSIG (w)));
> 
> This is the default path we take if:
> 
>  - W is not a syscall status
>  - W does not represent a SIGSTOP that have sent to stop the thread and
>    therefore want to suppress it
> 
> Change-Id: If2dc1f0537a549c293f7fa3c53efd00e3e194e79

This is OK.  Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork
  2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
@ 2022-03-31 18:22   ` Pedro Alves
  2022-03-31 18:28   ` Pedro Alves
  1 sibling, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:22 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Simon Marchi

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
> 
> The test introduced by this patch would fail in this configuration, with
> the native-gdbserver or native-extended-gdbserver boards:
> 
>     FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=2: next to for loop
> 
> The problem is that the step operation is forgotten when handling the
> fork/vfork.  With "debug infrun" and "debug remote", it looks like this
> (some lines omitted for brevity).  We do the next:
> 
>     [infrun] proceed: enter
>       [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
>       [infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154304.0] at 0x5555555553bf
>       [infrun] do_target_resume: resume_ptid=4154304.0.0, step=1, sig=GDB_SIGNAL_0
>       [remote] Sending packet: $vCont;r5555555553bf,5555555553c4:p3f63c0.3f63c0;c:p3f63c0.-1#cd
>     [infrun] proceed: exit
> 
> We then handle a fork event:
> 
>     [infrun] fetch_inferior_event: enter
>       [remote] wait: enter
>         [remote] Packet received: T05fork:p3f63ee.3f63ee;06:0100000000000000;07:b08e59f6ff7f0000;10:bf60e8f7ff7f0000;thread:p3f63c0.3f63c6;core:17;
>       [remote] wait: exit
>       [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>       [infrun] print_target_wait_results:   4154304.4154310.0 [Thread 4154304.4154310],
>       [infrun] print_target_wait_results:   status->kind = FORKED, child_ptid = 4154350.4154350.0
>       [infrun] handle_inferior_event: status->kind = FORKED, child_ptid = 4154350.4154350.0
>       [remote] Sending packet: $D;3f63ee#4b
>       [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154310.0] at 0x7ffff7e860bf
>       [infrun] do_target_resume: resume_ptid=4154304.0.0, step=0, sig=GDB_SIGNAL_0
>       [remote] Sending packet: $vCont;c:p3f63c0.-1#73
>     [infrun] fetch_inferior_event: exit
> 
> In the first snippet, we resume the stepping thread with the range-stepping (r)
> vCont command.  But after handling the fork (detaching the fork child), we
> resumed the whole process freely.  The stepping thread, which was paused by
> GDBserver while reporting the fork event, was therefore resumed freely, instead
> of confined to the addresses of the stepped line.  Note that since this
> is a "next", it could be that we have entered a function, installed a
> step-resume breakpoint, and it's ok to continue freely the stepping
> thread, but that's not the case here.  The two snippets shown above were
> next to each other in the logs.
> 
> For the fork case, we can resume stepping right after handling the
> event.
> 
> However, for the vfork case, where we are waiting for the
> external child process to exec or exit, we only resume the thread that
> called vfork, and keep the others stopped (see patch "gdb: fix handling of
> vfork by multi-threaded program" prior in this series).  So we can't
> resume the stepping thread right now.  Instead, do it after handling the
> vfork-done event.

Also OK.

Thanks for doing all this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork
  2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
  2022-03-31 18:22   ` Pedro Alves
@ 2022-03-31 18:28   ` Pedro Alves
  2022-04-01 18:42     ` Simon Marchi
  1 sibling, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2022-03-31 18:28 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Simon Marchi

[Lol, I reviewed this one twice, and the first time forgot to press send.]

This is OK.  Small nits in the testcase below.

On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:

> +#include <pthread.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <errno.h>
> +#include <assert.h>
> +#include <limits.h>
> +
> +/* Number of threads doing forks.  */
> +#define N_FORKERS 4
> +
> +static void *
> +forker (void *arg)
> +{
> +  for (;;)
> +    {
> +      pid_t pid = FORK_FUNC ();
> +
> +      if (pid == 0)
> +	_exit(11);

Missing space before parens.

> +
> +      assert (pid > 0);
> +
> +      /* Wait for children to exit.  */
> +      int ret;
> +      int stat;
> +      do {

{ on the next line.  Thus:

> +        ret = waitpid (pid, &stat, 0);
> +      } while (ret == EINTR);
> +
> +      assert (ret == pid);
> +      assert (WIFEXITED (stat));
> +      assert (WEXITSTATUS (stat) == 11);
> +
> +      usleep (40 * 1000);

Why not sleep_a_bit()?  Is the fact that it's a little less time significant?

> +    }
> +
> +  return NULL;
> +}
> +
> +static void
> +sleep_a_bit (void)
> +{
> +  usleep (1000 * 50);
> +}
> +
> +int
> +main (void)
> +{
> +  alarm (60);
> +
> +  pthread_t thread[N_FORKERS];
> +  for (int i = 0; i < N_FORKERS; ++i)

Declare "int i" outside the loop so this compiles as C90 as well.  Ditto below.

> +    {
> +      int ret = pthread_create (&thread[i], NULL, forker, NULL);
> +      assert (ret == 0);
> +    }
> +
> +  for (int i = 0; i < INT_MAX; ++i) /* for loop */
> +    {
> +      sleep_a_bit ();  /* break here */
> +      sleep_a_bit ();  /* other line */
> +    }
> +
> +  for (int i = 0; i < N_FORKERS; ++i)
> +    pthread_join (thread[i], NULL);
> +
> +  return 0;
> +}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done
  2022-03-31 18:17   ` Pedro Alves
@ 2022-04-01 14:25     ` Simon Marchi
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Marchi @ 2022-04-01 14:25 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2022-03-31 14:17, Pedro Alves wrote:
> On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
>> The inferior::waiting_for_vfork_done flag indicates that some thread in
>> that inferior is waiting for a vfork-done event.  Subsequent patches
>> will need to know which thread is waiting for that event.
>>
>> I think there is a latent buglet in that waiting_for_vfork_done is
>> currently not reset on inferior exec or exit.  I could imagine that if a
>> thread in the parent process calls exec or exit while another thread of
>> the parent process is waiting for its vfork child to exec or exit, we
>> could end up with inferior::waiting_for_vfork_done without a thread
>> actually waiting for a vfork-done event anymore.  And since that flag is
>> checked in resume_1, things could misbehave there.
>>
>> Since the new field points to a thread_info object, and those are
>> destroyed on exec or exit, it could be worse now since we could try to
>> access freed memory, if thread_waiting_for_vfork_done were to point to a
>> stale thread_info.  To avoid this, clear the field in
>> infrun_inferior_exit and infrun_inferior_execd.
> 
> This is OK, please mention explicitly in the body of the commit log that this
> is replacing the boolean with a thread_info pointer.  Not having that
> actually gave me pause.

Ok, done locally.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)
  2022-03-31 18:21   ` Pedro Alves
@ 2022-04-01 17:28     ` Simon Marchi
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Marchi @ 2022-04-01 17:28 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches; +Cc: Simon Marchi

> Thinking through this, one concern I had is that if the vfork child gets stuck for some reason
> before execing or exiting (hey we're a debugger, bugs happen!), then pausing all threads makes
> it so that you can end up stuck in gdb and can't get back the prompt (if the child ignores
> SIGINT, otherwise it just dies), or if you continue in the background, you
> can't interrupt the parent threads.
> 
> E.g., with your series, and:
> 
> diff --git c/gdb/testsuite/gdb.threads/vfork-multi-thread.c w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> index cd3dfcc1e65..6178de91dca 100644
> --- c/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> +++ w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> @@ -25,6 +25,8 @@ static void *vforker(void *arg)
>      /* A vfork child is not supposed to mess with the state of the program,
>         but it is helpful for the purpose of this test.  */
>      release_main = 1;
> +    while (1)
> +      sleep (1);
>      _exit(7);
>    }
>  
> @@ -41,7 +43,7 @@ static void should_break_here(void) {}
>  
>  int main()
>  {
> -
> +  signal (SIGINT, SIG_IGN);
>    pthread_t thread;
>    int ret = pthread_create(&thread, NULL, vforker, NULL);
>    assert(ret == 0);
> (END)
>  
> 
> We get:
> 
> (gdb) r
> Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff7d8a700 (LWP 1713229)]
> [Detaching after vfork from child process 1713230]
> ^C^C^C^C^C^C
> (hung forever)
> 
> Or:
> 
> $ gdb -ex "set non-stop on" ...
> (gdb) r&
> Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
> (gdb) [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff7d8a700 (LWP 1345571)]
> [Detaching after vfork from child process 1345572]
> info threads 
>   Id   Target Id                                             Frame 
> * 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
>   2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
> (gdb) interrupt 
> *nothing*
> (gdb) interrupt 
> *nothing*
> (gdb) info threads 
>   Id   Target Id                                             Frame 
> * 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
>   2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
> (gdb) 
> 
> 
> And then you can't quit gdb:
> 
> (gdb) q
> A debugging session is active.
> 
>         Inferior 1 [process 1345567] will be killed.
> 
> Quit anyway? (y or n) y
> 
> ^C^C^C^C
> (hung forever)

Ah yeah... that sucks.

We can find a ton of cases where we can deadlock when dealing with stuck
vfork children.  You can run the above program on the shell and try
attaching the parent, for example.

> I guess the ^C issue will be fixed by my series reworking ctrl-c, leaving
> gdb in control of the terminal at all times..

What will happen then?  GDB receives the SIGINT, then sends a SIGSTOP to
the vfork parent thread.  Do you know if a vfork parent thread can be
stopped at that point (while the vfork child thread is stuck in an
infinite loop)?

> Bah, I guess that this might be a meh/schrug, since if the parent is
> single-threaded, then you can't interrupt it either.  Though at least GDB
> doesn't get you get into the stuck state.

What do you mean "doesn't get you into the stuck state"?  I don't see
what's different here between the single and multi thread cases.

> I wonder whether a better solution would be something completely different.
> Like, say, make it so that gdb always remains attached to the child until it
> execs/exits, in which case we would no longer need to remove breakpoints from the
> parent/child, and thus we could let all the parent's threads run as well.
> 
> We actually do that for follow-fork parent / detach-on-fork-on and
> vfork -- see inferior::pending_detach.

You mean "follow-fork child" I think.  In that case, we keep the parent
attached until the child exec or exits.

I think it would be possible to do what you describe.  If we did this, I
think we could make it so we never resume a vfork parent thread, to
avoid getting into the situation where we want to interrupt / stop a
stuck vfork parent thread.  In my experience that's not possible, this
thread will not report a stop until the vfork child has exited/execed.
Although I haven't tried with SIGSTOP.


> Anyhow, onward.  With your currently solution at least the common scenario
> works, and we have a new testcase that helps whatever redesign in future.

Yeah, I think that pragamatically it's a step forward, assuming a
"collaborating" vfork children that doesn't hang.

> Despite these concerns, I'm actually happy to get this into the tree.  I think
> it's still better than what we have today.

Cool, thanks.

>> +
>> +  /* The rest of the function assumes non-stop==off and
>> +     target-non-stop==off.
>> +
>> +     If a thread is waiting for a vfork-done event, it means breakpoints are out
>> +     for this inferior (well, program space in fact).  We don't want to resume
>> +     any thread other than the one waiting for vfork done, otherwise these other
>> +     threads could miss breakpoints.  So if a thread in the resumption set is
>> +     waiting for a vfork-done event, resume only that thread.
>> +
>> +     The resumption set width depends on whether schedule-multiple is on or off.
>> +
>> +     Note that if the target_resume interface was more flexible, we could be
>> +     smarter here when schedule-multiple is on.  For example, imagine 3
>> +     inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2).  Threads
>> +     2.1 and 3.2 are both waiting for a vfork-done event.  Then we could ask the
>> +     target(s) to resume:
>> +
>> +      - All threads of inferior 1
>> +      - Thread 2.1
>> +      - Thread 3.2
>> +
>> +     Since we don't have that flexibility (we can only pass one ptid), just
>> +     resume the first thread waiting for a vfork-done event we find (e.g. thread
>> +     2.1).  */
> 
> We actually have the flexibility, but we don't use it.  I mean, we could use
> the commit_resumed mechanism in all-stop as well, which would let us prepare such
> a resume with multiple target_resume calls and final commit.  For a rainy day.

Ah, yeah.  We could make internal_resume_ptid return a vector of ptids,
and have do_target_resume call target_resume multiple times.  Even
without commit_resumed... that should work?  I think?

>> +  if (sched_multi)
>> +    {
>> +      for (inferior *inf : all_non_exited_inferiors ())
>> +	if (inf->thread_waiting_for_vfork_done != nullptr)
>> +	  return inf->thread_waiting_for_vfork_done->ptid;
>> +    }
>> +  else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
>> +    return current_inferior ()->thread_waiting_for_vfork_done->ptid;
>> +
>> +  return user_visible_resume_ptid (user_step);
>>  }
>>  
>>  /* Wrapper for target_resume, that handles infrun-specific
>> @@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
>>  		continue;
> 
> 
>> diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
>> new file mode 100644
>> index 000000000000..cd3dfcc1e653
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
>> @@ -0,0 +1,74 @@
>> +#include <assert.h>
> 
> Missing copyright header.

Done.

>> +#include <pthread.h>
>> +#include <stdio.h>
>> +#include <unistd.h>
>> +#include <sys/wait.h>
>> +
>> +// Start with:
>> +//
>> +//   ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here"
>> +//
>> +// Then do "continue".
>> +//
>> +// The main thread will likely cross should_break_here while we are handling
>> +// the vfork and breakpoints are removed, therefore missing the breakpoint.
> 
> The whole file needs a pass to adjust it to GNU conventions/formatting.

Yeah, I did that after Lancelot's comments on the v2.

>> +
>> +static volatile int release_vfork = 0;
>> +static volatile int release_main = 0;
>> +
>> +static void *vforker(void *arg)
>> +{
>> +  while (!release_vfork);
> 
> I'd add some usleep(1) here, to avoid hogging the cpu.

Ok.

>> +
>> +  pid_t pid = vfork ();
>> +  if (pid == 0) {
>> +    /* A vfork child is not supposed to mess with the state of the program,
>> +       but it is helpful for the purpose of this test.  */
>> +    release_main = 1;
>> +    _exit(7);
>> +  }
>> +
>> +  int stat;
>> +  int ret = waitpid (pid, &stat, 0);
>> +  assert (ret == pid);
>> +  assert (WIFEXITED (stat));
>> +  assert (WEXITSTATUS (stat) == 7);
>> +
>> +  return NULL;
>> +}
>> +
>> +static void should_break_here(void) {}
>> +
>> +int main()
>> +{
>> +
>> +  pthread_t thread;
>> +  int ret = pthread_create(&thread, NULL, vforker, NULL);
>> +  assert(ret == 0);
>> +
>> +  /* We break here first, while the thread is stuck on `!release_fork`.  */
>> +  release_vfork = 1;
>> +
>> +  /* We set a breakpoint on should_break_here.
>> +
>> +     We then set "release_fork" from the debugger and continue.  The main
>> +     thread hangs on `!release_main` while the non-main thread vforks.  During
>> +     the window of time where the two processes have a shared address space
>> +     (after vfork, before _exit), GDB removes the breakpoints from the address
>> +     space.  During that window, only the vfork-ing thread (the non-main
>> +     thread) is frozen by the kernel.  The main thread is free to execute.  The
>> +     child process sets `release_main`, releasing the main thread. A buggy GDB
>> +     would let the main thread execute during that window, leading to the
>> +     breakpoint on should_break_here being missed.  A fixed GDB does not resume
>> +     the threads of the vforking process other than the vforking thread.  When
>> +     the vfork child exits, the fixed GDB resumes the main thread, after
>> +     breakpoints are reinserted, so the breakpoint is not missed.  */
>> +
>> +  while (!release_main);
> 
> Ditto, usleep.

I don't know about this one, if it would affect the ability of the test
to reproduce the failure.  For the failure to be reproduced, we want
that main thread to exit that loop as fast as possible once release_main
is set and dash through "should_break_here".  If we make it sleep, it
increases the chances that the vfork child calls exit before the main
thread crosses should_break_here, therefore not exposing the problem.

I gave it a try, and with a usleep I see some failures (when removing
the fix), meaning the test still does its job, so I can add it.  If I
try with a "sleep (1)" though, the failures disappear.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork
  2022-03-31 18:28   ` Pedro Alves
@ 2022-04-01 18:42     ` Simon Marchi
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Marchi @ 2022-04-01 18:42 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches; +Cc: Simon Marchi

>> +#include <pthread.h>
>> +#include <unistd.h>
>> +#include <sys/types.h>
>> +#include <sys/wait.h>
>> +#include <errno.h>
>> +#include <assert.h>
>> +#include <limits.h>
>> +
>> +/* Number of threads doing forks.  */
>> +#define N_FORKERS 4
>> +
>> +static void *
>> +forker (void *arg)
>> +{
>> +  for (;;)
>> +    {
>> +      pid_t pid = FORK_FUNC ();
>> +
>> +      if (pid == 0)
>> +	_exit(11);
> 
> Missing space before parens.

Done.

> 
>> +
>> +      assert (pid > 0);
>> +
>> +      /* Wait for children to exit.  */
>> +      int ret;
>> +      int stat;
>> +      do {
> 
> { on the next line.  Thus:

Done (and re-indented the whole do-while).  Is there a missing part to
your comment that should come after "Thus:"?

> 
>> +        ret = waitpid (pid, &stat, 0);
>> +      } while (ret == EINTR);
>> +
>> +      assert (ret == pid);
>> +      assert (WIFEXITED (stat));
>> +      assert (WEXITSTATUS (stat) == 11);
>> +
>> +      usleep (40 * 1000);
> 
> Why not sleep_a_bit()?  Is the fact that it's a little less time significant?

Ah, good question.  We need the forking threads to fork reasonably
often, if we want the bug to reproduce most of the time.  But we need
some sleep, with any sleep the forking threads just spam events and
starve the stepping thread, and we don't really make progress.  I'll add
a comment to that effect.

> 
>> +    }
>> +
>> +  return NULL;
>> +}
>> +
>> +static void
>> +sleep_a_bit (void)
>> +{
>> +  usleep (1000 * 50);
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  alarm (60);
>> +
>> +  pthread_t thread[N_FORKERS];
>> +  for (int i = 0; i < N_FORKERS; ++i)
> 
> Declare "int i" outside the loop so this compiles as C90 as well.  Ditto below.

Done.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-04-01 18:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
2022-03-31 15:05   ` Pedro Alves
2022-03-31 15:35     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait Simon Marchi
2022-03-31 16:12   ` Pedro Alves
2022-03-31 17:06     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done Simon Marchi
2022-03-31 18:17   ` Pedro Alves
2022-04-01 14:25     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads Simon Marchi
2022-03-31 18:17   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume Simon Marchi
2022-03-31 18:18   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Simon Marchi
2022-03-31 18:21   ` Pedro Alves
2022-04-01 17:28     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 7/8] gdbserver: report correct status in thread stop race condition Simon Marchi
2022-03-31 18:21   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
2022-03-31 18:22   ` Pedro Alves
2022-03-31 18:28   ` Pedro Alves
2022-04-01 18:42     ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).