[PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn
@ 2015-07-23 17:24 Pedro Alves
  2015-07-23 17:25 ` [PATCH v2 2/2] PR threads/18600: Inferiors left around " Pedro Alves
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-23 17:24 UTC (permalink / raw)
  To: gdb-patches

This is intended for both master and 7.10.

The discussions on PR threads/18600 at:

 https://sourceware.org/ml/gdb-patches/2015-07/msg00153.html

identified two problems.  These patches are another revision of the
patches original posted at:

    [1] https://sourceware.org/ml/gdb-patches/2015-07/msg00186.html
    [2] https://sourceware.org/ml/gdb-patches/2015-07/msg00190.html

Which later Simon cleaned up a bit and wrote a test for:
  https://sourceware.org/ml/gdb-patches/2015-07/msg00595.html

This revision addresses the comments I made to Simon's version.

Tested on x86_64 Fedora 20, native, remote and extended-remote
gdbserver.

Pedro Alves (2):
  PR threads/18600: Threads left stopped after fork+thread spawn
  PR threads/18600: Inferiors left around after fork+thread spawn

 gdb/linux-nat.c                                 | 111 ++++++++++++-----------
 gdb/testsuite/gdb.threads/fork-plus-threads.c   | 115 ++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp |  65 ++++++++++++++
 3 files changed, 235 insertions(+), 56 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.exp

-- 
1.9.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] PR threads/18600: Inferiors left around after fork+thread spawn
  2015-07-23 17:24 [PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn Pedro Alves
@ 2015-07-23 17:25 ` Pedro Alves
  2015-07-23 17:25 ` [PATCH v2 1/2] PR threads/18600: Threads left stopped " Pedro Alves
  2015-07-23 18:21 ` [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode Pedro Alves
  2 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-23 17:25 UTC (permalink / raw)
  To: gdb-patches

The new gdb.threads/fork-plus-threads.exp test exposes one more
problem.  When one types "info inferiors" after running the program,
one see's a couple inferior left still, while there should only be
inferior #1 left.  E.g.:

 (gdb) info inferiors
   Num  Description       Executable
   4    process 8393      /home/pedro/bugs/src/test
   2    process 8388      /home/pedro/bugs/src/test
 * 1    <null>            /home/pedro/bugs/src/test
 (gdb) info threads

Calling prune_inferiors() manually at this point (from a top gdb) does
not remove them, because they still have inf->pid != 0 (while they
shouldn't).  This suggests that we never mourned those inferiors.

Enabling logs (master + previous patch) we see:

 ...
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped)
 WL: Handling extended status 0x03057f
 LHEW: Got clone event from LWP 9513, new child is LWP 9579
 [New Thread 0x7ffff37b8700 (LWP 9579)]
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited)
 WL: Thread 0x7ffff7fc2740 (LWP 9508) exited.
			    ^^^^^^^^
 [Thread 0x7ffff7fc2740 (LWP 9508) exited]
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited)
 WL: Thread 0x7ffff7fc2740 (LWP 9499) exited.
 [Thread 0x7ffff7fc2740 (LWP 9499) exited]
 RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0
 ...
 (gdb) info inferiors
   Num  Description       Executable
   5    process 9508      /home/pedro/bugs/src/test
		^^^^
   4    process 9503      /home/pedro/bugs/src/test
   3    process 9500      /home/pedro/bugs/src/test
   2    process 9499      /home/pedro/bugs/src/test
 * 1    <null>            /home/pedro/bugs/src/test
 (gdb)
 ...

Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line.
That's this in wait_lwp:

      /* Check if the thread has exited.  */
      if (WIFEXITED (status) || WIFSIGNALED (status))
	{
	  thread_dead = 1;
	  if (debug_linux_nat)
	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
				target_pid_to_str (lp->ptid));
	}
    }

That was the leader thread reporting an exit, meaning the whole
process is gone.  So the problem is that this code doesn't understand
that an WIFEXITED status of the leader LWP should be reported to
infrun as process exit.

gdb/ChangeLog:
2015-07-23  Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* linux-nat.c (wait_lwp): Report to the core when thread group
	leader exits.

gdb/testsuite/ChangeLog:
2015-07-23  Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
	only shows inferior 1.
---
 gdb/linux-nat.c                                 | 14 ++++++++++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp |  4 ++++
 2 files changed, 18 insertions(+)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 272b919..74d5997 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2274,6 +2274,20 @@ wait_lwp (struct lwp_info *lp)
       /* Check if the thread has exited.  */
       if (WIFEXITED (status) || WIFSIGNALED (status))
 	{
+	  if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+	    {
+	      if (debug_linux_nat)
+		fprintf_unfiltered (gdb_stdlog, "WL: Process %d exited.\n",
+				    ptid_get_pid (lp->ptid));
+
+	      /* This is the leader exiting, it means the whole
+		 process is gone.  Store the status to report to the
+		 core.  Store it in lp->waitstatus, because lp->status
+		 would be ambiguous (W_EXITCODE(0,0) == 0).  */
+	      store_waitstatus (&lp->waitstatus, status);
+	      return 0;
+	    }
+
 	  thread_dead = 1;
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
index 9989346..f44dd76 100644
--- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -58,4 +58,8 @@ gdb_test_multiple "" $test {
 gdb_test "info threads" "No threads\." \
     "no threads left"
 
+gdb_test "info inferiors" \
+    "Num\[ \t\]+Description\[ \t\]+Executable\[ \t\]+\r\n\\* 1 \[^\r\n\]+" \
+    "only inferior 1 left"
+
 set GDBFLAGS $saved_gdbflags
-- 
1.9.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] PR threads/18600: Threads left stopped after fork+thread spawn
  2015-07-23 17:24 [PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn Pedro Alves
  2015-07-23 17:25 ` [PATCH v2 2/2] PR threads/18600: Inferiors left around " Pedro Alves
@ 2015-07-23 17:25 ` Pedro Alves
  2015-07-30 18:08   ` [pushed] " Pedro Alves
  2015-07-23 18:21 ` [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode Pedro Alves
  2 siblings, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-07-23 17:25 UTC (permalink / raw)
  To: gdb-patches

When a program forks and another process start threads while gdb is
handling the fork event, newly created threads are left stuck stopped
by gdb, even though gdb presents them as "running", to the user.

This can be seen with the test added by this patch.  The test has the
inferior fork a certain number of times and waits for all children to
exit.  Each fork child spawns a number of threads that do nothing and
joins them immediately.  Normally, the program should run unimpeded
(from the point of view of the user) and exit very quickly.  Without
this fix, it doesn't because of some threads left stopped by gdb, so
inferior 1 never exits.

The program triggers when a new clone thread is found while inside the
linux_stop_and_wait_all_lwps call in linux-thread-db.c:

      linux_stop_and_wait_all_lwps ();

      ALL_LWPS (lp)
	if (ptid_get_pid (lp->ptid) == pid)
	  thread_from_lwp (lp->ptid);

      linux_unstop_all_lwps ();

Within linux_stop_and_wait_all_lwps, we reach
linux_handle_extended_wait with the "stopping" parameter set to 1, and
because of that we don't mark the new lwp as resumed.  As consequence,
the subsequent resume_stopped_resumed_lwps, called from
linux_unstop_all_lwps, never resumes the new LWP.

There's lots of cruft in linux_handle_extended_wait that no longer
makes sense.  On systems with CLONE events support, we don't rely on
libthread_db for thread listing anymore, so the code that preserves
stop_requested and the handling of last_resume_kind is all dead.

So the fix is to remove all that, and simply always mark the new LWP
as resumed, so that resume_stopped_resumed_lwps re-resumes it.

gdb/ChangeLog:
2015-07-23  Pedro Alves  <palves@redhat.com>
	    Simon Marchi  <simon.marchi@ericsson.com>

	PR threads/18600
	* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
	mark the new thread as resumed.  Remove STOPPING parameter.
	(wait_lwp): Adjust call to linux_handle_extended_wait.
	(linux_nat_filter_event): Adjust call to
	linux_handle_extended_wait.
	(resume_stopped_resumed_lwps): Add debug output.

gdb/testsuite/ChangeLog:
2015-07-23  Simon Marchi  <simon.marchi@ericsson.com>
	    Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* gdb.threads/fork-plus-threads.c: New file.
	* gdb.threads/fork-plus-threads.exp: New file.
---
 gdb/linux-nat.c                                 |  97 +++++++++-----------
 gdb/testsuite/gdb.threads/fork-plus-threads.c   | 115 ++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp |  61 +++++++++++++
 3 files changed, 217 insertions(+), 56 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index be429f8..272b919 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2000,8 +2000,7 @@ linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
    true, the new LWP remains stopped, otherwise it is continued.  */
 
 static int
-linux_handle_extended_wait (struct lwp_info *lp, int status,
-			    int stopping)
+linux_handle_extended_wait (struct lwp_info *lp, int status)
 {
   int pid = ptid_get_lwp (lp->ptid);
   struct target_waitstatus *ourstatus = &lp->waitstatus;
@@ -2071,7 +2070,7 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	ourstatus->kind = TARGET_WAITKIND_FORKED;
       else if (event == PTRACE_EVENT_VFORK)
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
-      else
+      else if (event == PTRACE_EVENT_CLONE)
 	{
 	  struct lwp_info *new_lp;
 
@@ -2086,43 +2085,7 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	  new_lp = add_lwp (ptid_build (ptid_get_pid (lp->ptid), new_pid, 0));
 	  new_lp->cloned = 1;
 	  new_lp->stopped = 1;
-
-	  if (WSTOPSIG (status) != SIGSTOP)
-	    {
-	      /* This can happen if someone starts sending signals to
-		 the new thread before it gets a chance to run, which
-		 have a lower number than SIGSTOP (e.g. SIGUSR1).
-		 This is an unlikely case, and harder to handle for
-		 fork / vfork than for clone, so we do not try - but
-		 we handle it for clone events here.  We'll send
-		 the other signal on to the thread below.  */
-
-	      new_lp->signalled = 1;
-	    }
-	  else
-	    {
-	      struct thread_info *tp;
-
-	      /* When we stop for an event in some other thread, and
-		 pull the thread list just as this thread has cloned,
-		 we'll have seen the new thread in the thread_db list
-		 before handling the CLONE event (glibc's
-		 pthread_create adds the new thread to the thread list
-		 before clone'ing, and has the kernel fill in the
-		 thread's tid on the clone call with
-		 CLONE_PARENT_SETTID).  If that happened, and the core
-		 had requested the new thread to stop, we'll have
-		 killed it with SIGSTOP.  But since SIGSTOP is not an
-		 RT signal, it can only be queued once.  We need to be
-		 careful to not resume the LWP if we wanted it to
-		 stop.  In that case, we'll leave the SIGSTOP pending.
-		 It will later be reported as GDB_SIGNAL_0.  */
-	      tp = find_thread_ptid (new_lp->ptid);
-	      if (tp != NULL && tp->stop_requested)
-		new_lp->last_resume_kind = resume_stop;
-	      else
-		status = 0;
-	    }
+	  new_lp->resumed = 1;
 
 	  /* If the thread_db layer is active, let it record the user
 	     level thread id and status, and add the thread to GDB's
@@ -2136,19 +2099,23 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	    }
 
 	  /* Even if we're stopping the thread for some reason
-	     internal to this module, from the user/frontend's
-	     perspective, this new thread is running.  */
+	     internal to this module, from the perspective of infrun
+	     and the user/frontend, this new thread is running until
+	     it next reports a stop.  */
 	  set_running (new_lp->ptid, 1);
-	  if (!stopping)
-	    {
-	      set_executing (new_lp->ptid, 1);
-	      /* thread_db_attach_lwp -> lin_lwp_attach_lwp forced
-		 resume_stop.  */
-	      new_lp->last_resume_kind = resume_continue;
-	    }
+	  set_executing (new_lp->ptid, 1);
 
-	  if (status != 0)
+	  if (WSTOPSIG (status) != SIGSTOP)
 	    {
+	      /* This can happen if someone starts sending signals to
+		 the new thread before it gets a chance to run, which
+		 have a lower number than SIGSTOP (e.g. SIGUSR1).
+		 This is an unlikely case, and harder to handle for
+		 fork / vfork than for clone, so we do not try - but
+		 we handle it for clone events here.  */
+
+	      new_lp->signalled = 1;
+
 	      /* We created NEW_LP so it cannot yet contain STATUS.  */
 	      gdb_assert (new_lp->status == 0);
 
@@ -2162,7 +2129,6 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	      new_lp->status = status;
 	    }
 
-	  new_lp->resumed = !stopping;
 	  return 1;
 	}
 
@@ -2353,7 +2319,7 @@ wait_lwp (struct lwp_info *lp)
 	fprintf_unfiltered (gdb_stdlog,
 			    "WL: Handling extended status 0x%06x\n",
 			    status);
-      linux_handle_extended_wait (lp, status, 1);
+      linux_handle_extended_wait (lp, status);
       return 0;
     }
 
@@ -3155,7 +3121,7 @@ linux_nat_filter_event (int lwpid, int status)
 	fprintf_unfiltered (gdb_stdlog,
 			    "LLW: Handling extended status 0x%06x\n",
 			    status);
-      if (linux_handle_extended_wait (lp, status, 0))
+      if (linux_handle_extended_wait (lp, status))
 	return NULL;
     }
 
@@ -3673,9 +3639,28 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
 {
   ptid_t *wait_ptid_p = data;
 
-  if (lp->stopped
-      && lp->resumed
-      && !lwp_status_pending_p (lp))
+  if (!lp->stopped)
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, not stopped\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else if (!lp->resumed)
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, not resumed\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else if (lwp_status_pending_p (lp))
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, has pending status\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else
     {
       struct regcache *regcache = get_thread_regcache (lp->ptid);
       struct gdbarch *gdbarch = get_regcache_arch (regcache);
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.c b/gdb/testsuite/gdb.threads/fork-plus-threads.c
new file mode 100644
index 0000000..780a4b8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.c
@@ -0,0 +1,115 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+
+/* Number of times the main process forks.  */
+#define NFORKS 10
+
+/* Number of threads by each fork child.  */
+#define NTHREADS 10
+
+static void *
+thread_func (void *arg)
+{
+  /* Empty.  */
+}
+
+static void
+fork_child (void)
+{
+  pthread_t threads[NTHREADS];
+  int i;
+  int ret;
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      ret = pthread_create (&threads[i], NULL, thread_func, NULL);
+      assert (ret == 0);
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      ret = pthread_join (threads[i], NULL);
+      assert (ret == 0);
+    }
+}
+
+int
+main (void)
+{
+  pid_t childs[NFORKS];
+  int i;
+  int status;
+  int num_exited = 0;
+
+  /* Don't run forever if the wait loop below gets stuck.  */
+  alarm (180);
+
+  for (i = 0; i < NFORKS; i++)
+    {
+      pid_t pid;
+
+      pid = fork ();
+
+      if (pid > 0)
+	{
+	  /* Parent.  */
+	  childs[i] = pid;
+	}
+      else if (pid == 0)
+	{
+	  /* Child.  */
+	  fork_child ();
+	  return 0;
+	}
+      else
+	{
+	  perror ("fork");
+	  return 1;
+	}
+    }
+
+  while (num_exited != NFORKS)
+    {
+      pid_t pid = wait (&status);
+
+      if (pid == -1)
+	{
+	  perror ("wait");
+	  return 1;
+	}
+
+      if (WIFEXITED (status))
+	{
+	  num_exited++;
+	}
+      else
+	{
+	  printf ("Hmm, unexpected wait status 0x%x from child %d\n", status,
+		  pid);
+	}
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
new file mode 100644
index 0000000..9989346
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -0,0 +1,61 @@
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test verifies that threads created by the child fork are
+# properly handled.  Specifically, GDB used to have a bug where it
+# would leave child fork threads stuck stopped, even though "info
+# threads" would show them running.
+#
+# See https://sourceware.org/bugzilla/show_bug.cgi?id=18600
+
+# gdbserver's fork support is broken in non-stop mode.
+#if [gdb_is_target_remote] {
+#    return
+#}
+
+standard_testfile
+set saved_gdbflags $GDBFLAGS
+set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    set GDBFLAGS $saved_gdbflags
+    return -1
+}
+
+if ![runto_main] then {
+    fail "Can't run to main"
+    set GDBFLAGS $saved_gdbflags
+    return 0
+}
+
+gdb_test_no_output "set detach-on-fork off"
+set test "continue &"
+gdb_test_multiple $test $test {
+    -re "$gdb_prompt " {
+	pass $test
+    }
+}
+
+set test "reached breakpoint"
+gdb_test_multiple "" $test {
+    -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
+	pass $test
+    }
+}
+
+gdb_test "info threads" "No threads\." \
+    "no threads left"
+
+set GDBFLAGS $saved_gdbflags
-- 
1.9.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-23 17:24 [PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn Pedro Alves
  2015-07-23 17:25 ` [PATCH v2 2/2] PR threads/18600: Inferiors left around " Pedro Alves
  2015-07-23 17:25 ` [PATCH v2 1/2] PR threads/18600: Threads left stopped " Pedro Alves
@ 2015-07-23 18:21 ` Pedro Alves
  2015-07-24 18:05   ` Simon Marchi
  2015-07-24 18:43   ` Don Breazeal
  2 siblings, 2 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-23 18:21 UTC (permalink / raw)
  To: Breazeal, Don, Simon Marchi; +Cc: GDB Patches

So I managed to extract out this smaller patch from the
gdbserver fixes I mentioned.  I think this one looks safe enough
for 7.10.  WDYT?

-----------
From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 23 Jul 2015 18:49:51 +0100
Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] #2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By inspection, I also noticed that we miss leaving the child with the
suspend count incremented if stopping threads, like we do for clone
threads.

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-23  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
---
 gdb/gdbserver/linux-low.c                       |  7 ++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 17b2a51..56a33ff 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
 	  child_lwp->status_pending_p = 0;
 	  child_thr = get_lwp_thread (child_lwp);
 	  child_thr->last_resume_kind = resume_stop;
+	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
+
+	  /* If we're suspending all threads, leave this one suspended
+	     too.  */
+	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
+	    child_lwp->suspended = 1;
+
 	  parent_proc = get_thread_process (event_thr);
 	  child_proc->attached = parent_proc->attached;
 	  clone_all_breakpoints (&child_proc->breakpoints,
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
index f44dd76..80d2464 100644
--- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
     }
 }
 
+# gdbserver had a bug that resulted in reporting the fork child's
+# initial stop to gdb, which gdb does not expect, in turn resulting in
+# a broken session, like:
+#
+#  [Thread 31536.31536] #16 stopped.                                   <== BAD
+#  [New Thread 31547.31547]
+#  [Inferior 10 (process 31536) exited normally]
+#  [New Thread 31547.31560]
+#
+#  [Thread 31547.31547] #18 stopped.                                   <== BAD
+#  Cannot remove breakpoints because program is no longer writable.    <== BAD
+#  Further execution is probably impossible.                           <== BAD
+#  [Inferior 11 (process 31547) exited normally]
+#  [Inferior 1 (process 31454) exited normally]
+#
+# These variables track whether we see such broken behavior.
+set saw_cannot_remove_breakpoints 0
+set saw_thread_stopped 0
+
 set test "reached breakpoint"
 gdb_test_multiple "" $test {
+    -re "Cannot remove breakpoints" {
+	set saw_cannot_remove_breakpoints 1
+	exp_continue
+    }
+    -re "Thread \[^\r\n\]+ stopped\\." {
+	set saw_thread_stopped 1
+	exp_continue
+    }
     -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
 	pass $test
     }
 }
 
+gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
+gdb_assert !$saw_thread_stopped "no spurious thread stop"
+
 gdb_test "info threads" "No threads\." \
     "no threads left"
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-23 18:21 ` [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode Pedro Alves
@ 2015-07-24 18:05   ` Simon Marchi
  2015-07-24 18:17     ` Pedro Alves
  2015-07-24 18:43   ` Don Breazeal
  1 sibling, 1 reply; 16+ messages in thread
From: Simon Marchi @ 2015-07-24 18:05 UTC (permalink / raw)
  To: Pedro Alves, Breazeal, Don; +Cc: GDB Patches

On 15-07-23 02:21 PM, Pedro Alves wrote:
> So I managed to extract out this smaller patch from the
> gdbserver fixes I mentioned.  I think this one looks safe enough
> for 7.10.  WDYT?
> 
> -----------
> From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 23 Jul 2015 18:49:51 +0100
> Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
>  continue &
>  Continuing.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
>  [New Thread 28092.28092]
> 
>  [Thread 28092.28092] #2 stopped.
>  [New Thread 28094.28094]
>  [Inferior 2 (process 28092) exited normally]
>  [New Thread 28094.28105]
>  [New Thread 28094.28109]
> 
> ...
> 
> [Thread 28174.28174] #18 stopped.
>  [New Thread 28185.28185]
>  [Inferior 10 (process 28174) exited normally]
>  [New Thread 28185.28196]
> 
>  [Thread 28185.28185] #20 stopped.
>  Cannot remove breakpoints because program is no longer writable.
>  Further execution is probably impossible.
>  [Inferior 11 (process 28185) exited normally]
>  [Inferior 1 (process 28091) exited normally]
>  PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
>  info threads
>  No threads.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
>  info inferiors
>    Num  Description       Executable
>  * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
> 
> All the "[Thread FOO] #NN stopped." above are bogus, as well as the
> "Cannot remove breakpoints because program is no longer writable.",
> which is a consequence.
> 
> The problem is that when we intercept a fork event, we should report
> the event for the parent, only, and leave the child stopped, but not
> report its stop event.  GDB later decides whether to follow the parent
> or the child.  But because handle_extended_wait does not set the
> child's last_status.kind to TARGET_WAITKIND_STOPPED, a
> stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
> memory) by mistake ends up queueing a SIGSTOP on the child, resuming
> it, and then when that SIGSTOP is intercepted, because the LWP has
> last_resume_kind set to resume_stop, gdbserver reports the stop to
> GDB, as GDB_SIGNAL_0:
> 
> ...
>  >>>> entering unstop_all_lwps
>  unstopping all lwps
>  proceed_one_lwp: lwp 1600
>     client wants LWP to remain 1600 stopped
>  proceed_one_lwp: lwp 1828
>  Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Sending sigstop to lwp 1828
>  pc is 0x3615ebc7cc
>  Resuming lwp 1828 (continue, signal 0, stop expected)
>    continue from pc 0x3615ebc7cc
>  unstop_all_lwps done
>  sigchld_handler
>  <<<< exiting unstop_all_lwps
>  handling possible target event
>  >>>> entering linux_wait_1
>  linux_wait_1: [<all threads>]
>  my_waitpid (-1, 0x40000001)
>  my_waitpid (-1, 0x1): status(137f), 1828
>  LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
>  LLW: waitpid 1828 received Stopped (signal) (stopped)
>  pc is 0x3615ebc7cc
>  Expected stop.
>  LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
>  linux_wait_1 ret = LWP 1828.1828, 1, 0
>  <<<< exiting linux_wait_1
>  Writing resume reply for LWP 1828.1828:1
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> By inspection, I also noticed that we miss leaving the child with the
> suspend count incremented if stopping threads, like we do for clone
> threads.
> 
> Tested on x86_64 Fedora 20, extended-remote.
> 
> gdb/gdbserver/ChangeLog:
> 2015-07-23  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (handle_extended_wait): Set the child's last
> 	reported status to TARGET_WAITKIND_STOPPED.
> ---
>  gdb/gdbserver/linux-low.c                       |  7 ++++++
>  gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 17b2a51..56a33ff 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>  	  child_lwp->status_pending_p = 0;
>  	  child_thr = get_lwp_thread (child_lwp);
>  	  child_thr->last_resume_kind = resume_stop;
> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
> +
> +	  /* If we're suspending all threads, leave this one suspended
> +	     too.  */
> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
> +	    child_lwp->suspended = 1;
> +
>  	  parent_proc = get_thread_process (event_thr);
>  	  child_proc->attached = parent_proc->attached;
>  	  clone_all_breakpoints (&child_proc->breakpoints,
> diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> index f44dd76..80d2464 100644
> --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> @@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
>      }
>  }
>  
> +# gdbserver had a bug that resulted in reporting the fork child's
> +# initial stop to gdb, which gdb does not expect, in turn resulting in
> +# a broken session, like:
> +#
> +#  [Thread 31536.31536] #16 stopped.                                   <== BAD
> +#  [New Thread 31547.31547]
> +#  [Inferior 10 (process 31536) exited normally]
> +#  [New Thread 31547.31560]
> +#
> +#  [Thread 31547.31547] #18 stopped.                                   <== BAD
> +#  Cannot remove breakpoints because program is no longer writable.    <== BAD
> +#  Further execution is probably impossible.                           <== BAD
> +#  [Inferior 11 (process 31547) exited normally]
> +#  [Inferior 1 (process 31454) exited normally]
> +#
> +# These variables track whether we see such broken behavior.
> +set saw_cannot_remove_breakpoints 0
> +set saw_thread_stopped 0
> +
>  set test "reached breakpoint"
>  gdb_test_multiple "" $test {
> +    -re "Cannot remove breakpoints" {
> +	set saw_cannot_remove_breakpoints 1
> +	exp_continue
> +    }
> +    -re "Thread \[^\r\n\]+ stopped\\." {
> +	set saw_thread_stopped 1
> +	exp_continue
> +    }
>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>  	pass $test
>      }
>  }
>  
> +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
> +gdb_assert !$saw_thread_stopped "no spurious thread stop"
> +
>  gdb_test "info threads" "No threads\." \
>      "no threads left"

I tried it and it works as expected.  If you try the same test program in all-stop
though, fork childs are left stopped.  Is it expected?  I am not sure how forking
interacts with all-stop.

-----------------------

$ ./gdb -q -nx -ex "set detach-on-fork off"  testsuite/gdb.threads/fork-plus-threads
Reading symbols from testsuite/gdb.threads/fork-plus-threads...done.
(gdb) r &
Starting program: /home/emaisin/src/binutils-gdb/gdb/testsuite/gdb.threads/fork-plus-threads
(gdb) [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5304]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5305]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5306]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5307]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5308]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5309]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5310]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5311]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5312]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5313]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
i th
  Id   Target Id         Frame
  11   process 5313 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  10   process 5312 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  9    process 5311 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  8    process 5310 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  7    process 5309 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  6    process 5308 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  5    process 5307 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  4    process 5306 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  3    process 5305 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  2    process 5304 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
* 1    Thread 0x7ffff7fc9740 (LWP 5300) "fork-plus-threa" (running)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-24 18:05   ` Simon Marchi
@ 2015-07-24 18:17     ` Pedro Alves
  0 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-24 18:17 UTC (permalink / raw)
  To: Simon Marchi, Breazeal, Don; +Cc: GDB Patches

On 07/24/2015 07:05 PM, Simon Marchi wrote:

> I tried it and it works as expected.  If you try the same test program in all-stop
> though, fork childs are left stopped.  Is it expected?  I am not sure how forking
> interacts with all-stop.

Yeah, in all-stop, you need "set schedule-multiple on" to let all processes run.
That seems to trip on more breakage:

...
[Thread 0x7ffff57bc700 (LWP 11703) exited]
[Thread 0x7ffff7fc1700 (LWP 11700) exited]
[New Thread 0x7ffff77c0700 (LWP 11710)]
[New Thread 0x7ffff67be700 (LWP 11709)]
[New Thread 0x7ffff57bc700 (LWP 11711)]
[New Thread 0x7ffff4fbb700 (LWP 11712)]
[New Thread 0x7ffff3fb9700 (LWP 11713)]
[New process 11702]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread 0x7ffff3fb9700 (LWP 11713) exited]
[Thread 0x7ffff4fbb700 (LWP 11712) exited]
[Thread 0x7ffff57bc700 (LWP 11711) exited]
[Thread 0x7ffff77c0700 (LWP 11710) exited]
[Thread 0x7ffff67be700 (LWP 11709) exited]
[New Thread 0x7ffff7fc1700 (LWP 11714)]
[New Thread 0x7ffff6fbf700 (LWP 11716)]
[New Thread 0x7ffff5fbd700 (LWP 11715)]
[New Thread 0x7ffff37b8700 (LWP 11717)]
[Inferior 3 (process 11634) exited normally]
Cannot find new threads: generic error
(gdb) info threads
Cannot find new threads: generic error
(gdb) info threads
Cannot find new threads: generic error
(gdb) q

ISTR that's not a new bug, but I haven't tried older releases.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-23 18:21 ` [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode Pedro Alves
  2015-07-24 18:05   ` Simon Marchi
@ 2015-07-24 18:43   ` Don Breazeal
  2015-07-29 13:21     ` Pedro Alves
  1 sibling, 1 reply; 16+ messages in thread
From: Don Breazeal @ 2015-07-24 18:43 UTC (permalink / raw)
  To: Pedro Alves, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 7/23/2015 11:21 AM, Pedro Alves wrote:
> So I managed to extract out this smaller patch from the
> gdbserver fixes I mentioned.  I think this one looks safe enough
> for 7.10.  WDYT?
> 
> -----------
> From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 23 Jul 2015 18:49:51 +0100
> Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
>  continue &
>  Continuing.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
>  [New Thread 28092.28092]
> 
>  [Thread 28092.28092] #2 stopped.
>  [New Thread 28094.28094]
>  [Inferior 2 (process 28092) exited normally]
>  [New Thread 28094.28105]
>  [New Thread 28094.28109]
> 
> ...
> 
> [Thread 28174.28174] #18 stopped.
>  [New Thread 28185.28185]
>  [Inferior 10 (process 28174) exited normally]
>  [New Thread 28185.28196]
> 
>  [Thread 28185.28185] #20 stopped.
>  Cannot remove breakpoints because program is no longer writable.
>  Further execution is probably impossible.
>  [Inferior 11 (process 28185) exited normally]
>  [Inferior 1 (process 28091) exited normally]
>  PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
>  info threads
>  No threads.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
>  info inferiors
>    Num  Description       Executable
>  * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
> 
> All the "[Thread FOO] #NN stopped." above are bogus, as well as the
> "Cannot remove breakpoints because program is no longer writable.",
> which is a consequence.
> 
> The problem is that when we intercept a fork event, we should report
> the event for the parent, only, and leave the child stopped, but not
> report its stop event.  GDB later decides whether to follow the parent
> or the child.  But because handle_extended_wait does not set the
> child's last_status.kind to TARGET_WAITKIND_STOPPED, a
> stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
> memory) by mistake ends up queueing a SIGSTOP on the child, resuming
> it, and then when that SIGSTOP is intercepted, because the LWP has
> last_resume_kind set to resume_stop, gdbserver reports the stop to
> GDB, as GDB_SIGNAL_0:
> 
> ...
>  >>>> entering unstop_all_lwps
>  unstopping all lwps
>  proceed_one_lwp: lwp 1600
>     client wants LWP to remain 1600 stopped
>  proceed_one_lwp: lwp 1828
>  Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Sending sigstop to lwp 1828
>  pc is 0x3615ebc7cc
>  Resuming lwp 1828 (continue, signal 0, stop expected)
>    continue from pc 0x3615ebc7cc
>  unstop_all_lwps done
>  sigchld_handler
>  <<<< exiting unstop_all_lwps
>  handling possible target event
>  >>>> entering linux_wait_1
>  linux_wait_1: [<all threads>]
>  my_waitpid (-1, 0x40000001)
>  my_waitpid (-1, 0x1): status(137f), 1828
>  LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
>  LLW: waitpid 1828 received Stopped (signal) (stopped)
>  pc is 0x3615ebc7cc
>  Expected stop.
>  LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
>  linux_wait_1 ret = LWP 1828.1828, 1, 0
>  <<<< exiting linux_wait_1
>  Writing resume reply for LWP 1828.1828:1
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> By inspection, I also noticed that we miss leaving the child with the
> suspend count incremented if stopping threads, like we do for clone
> threads.
> 
> Tested on x86_64 Fedora 20, extended-remote.
> 
> gdb/gdbserver/ChangeLog:
> 2015-07-23  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (handle_extended_wait): Set the child's last
> 	reported status to TARGET_WAITKIND_STOPPED.
> ---
>  gdb/gdbserver/linux-low.c                       |  7 ++++++
>  gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 17b2a51..56a33ff 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>  	  child_lwp->status_pending_p = 0;
>  	  child_thr = get_lwp_thread (child_lwp);
>  	  child_thr->last_resume_kind = resume_stop;
> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;

This makes perfect sense to me.

> +
> +	  /* If we're suspending all threads, leave this one suspended
> +	     too.  */
> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
> +	    child_lwp->suspended = 1;

I have a question about this.  In the definition of struct lwp_info in
linux-low.h, it has this comment:

  /* When this is true, we shall not try to resume this thread, even
     if last_resume_kind isn't resume_stop.  */
  int suspended;

Since we are setting last_resume_kind to resume_stop here, is this
unnecessary?

Thanks,
--Don

> +
>  	  parent_proc = get_thread_process (event_thr);
>  	  child_proc->attached = parent_proc->attached;
>  	  clone_all_breakpoints (&child_proc->breakpoints,
> diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> index f44dd76..80d2464 100644
> --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> @@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
>      }
>  }
>  
> +# gdbserver had a bug that resulted in reporting the fork child's
> +# initial stop to gdb, which gdb does not expect, in turn resulting in
> +# a broken session, like:
> +#
> +#  [Thread 31536.31536] #16 stopped.                                   <== BAD
> +#  [New Thread 31547.31547]
> +#  [Inferior 10 (process 31536) exited normally]
> +#  [New Thread 31547.31560]
> +#
> +#  [Thread 31547.31547] #18 stopped.                                   <== BAD
> +#  Cannot remove breakpoints because program is no longer writable.    <== BAD
> +#  Further execution is probably impossible.                           <== BAD
> +#  [Inferior 11 (process 31547) exited normally]
> +#  [Inferior 1 (process 31454) exited normally]
> +#
> +# These variables track whether we see such broken behavior.
> +set saw_cannot_remove_breakpoints 0
> +set saw_thread_stopped 0
> +
>  set test "reached breakpoint"
>  gdb_test_multiple "" $test {
> +    -re "Cannot remove breakpoints" {
> +	set saw_cannot_remove_breakpoints 1
> +	exp_continue
> +    }
> +    -re "Thread \[^\r\n\]+ stopped\\." {
> +	set saw_thread_stopped 1
> +	exp_continue
> +    }
>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>  	pass $test
>      }
>  }
>  
> +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
> +gdb_assert !$saw_thread_stopped "no spurious thread stop"
> +
>  gdb_test "info threads" "No threads\." \
>      "no threads left"
>  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-24 18:43   ` Don Breazeal
@ 2015-07-29 13:21     ` Pedro Alves
  2015-07-29 13:38       ` Pedro Alves
  2015-07-30 18:13       ` Pedro Alves
  0 siblings, 2 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-29 13:21 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

[-- Attachment #1: Type: text/plain, Size: 4061 bytes --]

Hi Don,

Sorry for the delay.

On 07/24/2015 07:43 PM, Don Breazeal wrote:

>> index 17b2a51..56a33ff 100644
>> --- a/gdb/gdbserver/linux-low.c
>> +++ b/gdb/gdbserver/linux-low.c
>> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>>  	  child_lwp->status_pending_p = 0;
>>  	  child_thr = get_lwp_thread (child_lwp);
>>  	  child_thr->last_resume_kind = resume_stop;
>> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
> 
> This makes perfect sense to me.
> 

Great.

>> +
>> +	  /* If we're suspending all threads, leave this one suspended
>> +	     too.  */
>> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
>> +	    child_lwp->suspended = 1;
> 
> I have a question about this.  In the definition of struct lwp_info in
> linux-low.h, it has this comment:
> 
>   /* When this is true, we shall not try to resume this thread, even
>      if last_resume_kind isn't resume_stop.  */
>   int suspended;
> 
> Since we are setting last_resume_kind to resume_stop here, is this
> unnecessary?

We still need it, because otherwise we'd decrement the suspend count
below 0:

static int
unsuspend_and_proceed_one_lwp (struct inferior_list_entry *entry, void *except)
{
  struct thread_info *thread = (struct thread_info *) entry;
  struct lwp_info *lwp = get_thread_lwp (thread);

  if (lwp == except)
    return 0;

  lwp->suspended--;
  gdb_assert (lwp->suspended >= 0);

  return proceed_one_lwp (entry, except);
}

It's proceed_one_lwp that skips resuming if the client wants the
lwp stopped:

static int
proceed_one_lwp (struct inferior_list_entry *entry, void *except)
{
...
  if (thread->last_resume_kind == resume_stop
      && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
    {
      if (debug_threads)
	debug_printf ("   client wants LWP to remain %ld stopped\n",
		      lwpid_of (thread));
      return 0;
    }

I tried writing a test for this, by making a multithreaded program
have all its threads but the main continuously fork (see attached), while
the main thread continuously steps over a breakpoint (a conditional
breakpoint with condition "0" should do it, as gdbserver handles
that breakpoint itself), but that stumbles on yet more problems...  :-/

$ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999"
...
Remote debugging using :9999
(gdb)
[Thread 24971.24971] #1 stopped.
0x0000003615a011f0 in ?? ()
c&
Continuing.
(gdb) [New Thread 24971.24981]
[New Thread 24983.24983]
[New Thread 24971.24982]

[Thread 24983.24983] #3 stopped.
0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
130       pid = ARCH_FORK ();
[New Thread 24984.24984]
Error in re-setting breakpoint -16: PC register is not available
Error in re-setting breakpoint -17: PC register is not available
Error in re-setting breakpoint -18: PC register is not available
Error in re-setting breakpoint -19: PC register is not available
Error in re-setting breakpoint -24: PC register is not available
Error in re-setting breakpoint -25: PC register is not available
Error in re-setting breakpoint -26: PC register is not available
Error in re-setting breakpoint -27: PC register is not available
Error in re-setting breakpoint -28: PC register is not available
Error in re-setting breakpoint -29: PC register is not available
Error in re-setting breakpoint -30: PC register is not available
PC register is not available
(gdb)

>>  set test "reached breakpoint"

BTW, I noticed that this test message is stale from my previous attempt
at running to a breakpoint instead of to exit.  I changed it to:

 set test "inferior 1 exited"

in patch 1/2.

>>  gdb_test_multiple "" $test {
>> +    -re "Cannot remove breakpoints" {
>> +	set saw_cannot_remove_breakpoints 1
>> +	exp_continue
>> +    }
>> +    -re "Thread \[^\r\n\]+ stopped\\." {
>> +	set saw_thread_stopped 1
>> +	exp_continue
>> +    }
>>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>>  	pass $test
>>      }
>>  }

Thanks,
Pedro Alves

[-- Attachment #2: fork-plus-threads-2.c --]
[-- Type: text/plain, Size: 2029 bytes --]

/* This testcase is part of GDB, the GNU debugger.

   Copyright 2015 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

#include <assert.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>

/* Number of threads.  Each thread continuously spawns a fork and wait
   for it.  If we have another thread continuously start a step over,
   gdbserver should end up finding new forks while suspending
   threads.  */
#define NTHREADS 10

pthread_t threads[NTHREADS];

static void *
thread_func (void *arg)
{
  while (1)
    {
      pid_t pid;

      pid = fork ();

      if (pid > 0)
	{
	  int status;

	  /* Parent.  */
	  pid = waitpid (pid, &status, 0);
	  if (pid == -1)
	    {
	      perror ("wait");
	      exit (1);
	    }

	  if (!WIFEXITED (status))
	    {
	      printf ("Unexpected wait status 0x%x from child %d\n",
		      status, pid);
	    }
	}
      else if (pid == 0)
	{
	  /* Child.  */
	  exit (0);
	}
      else
	{
	  perror ("fork");
	  exit (1);
	}
    }
}

int
main (void)
{
  int i;
  int ret;

  for (i = 0; i < NTHREADS; i++)
    {
      ret = pthread_create (&threads[i], NULL, thread_func, NULL);
      assert (ret == 0);
    }

  for (i = 0; i < NTHREADS; i++)
    {
      ret = pthread_join (threads[i], NULL);
      assert (ret == 0);
    }

  /* Don't run forever.  */
  sleep (180);

  return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-29 13:21     ` Pedro Alves
@ 2015-07-29 13:38       ` Pedro Alves
  2015-07-29 14:23         ` Pedro Alves
  2015-07-30 18:13       ` Pedro Alves
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-07-29 13:38 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 07/29/2015 02:21 PM, Pedro Alves wrote:

> 
> I tried writing a test for this, by making a multithreaded program
> have all its threads but the main continuously fork (see attached), while
> the main thread continuously steps over a breakpoint (a conditional
> breakpoint with condition "0" should do it, as gdbserver handles
> that breakpoint itself), but that stumbles on yet more problems...  :-/
> 
> $ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999"
> ...
> Remote debugging using :9999
> (gdb)
> [Thread 24971.24971] #1 stopped.
> 0x0000003615a011f0 in ?? ()
> c&
> Continuing.
> (gdb) [New Thread 24971.24981]
> [New Thread 24983.24983]
> [New Thread 24971.24982]
> 
> [Thread 24983.24983] #3 stopped.
> 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
> 130       pid = ARCH_FORK ();
> [New Thread 24984.24984]
> Error in re-setting breakpoint -16: PC register is not available
> Error in re-setting breakpoint -17: PC register is not available
> Error in re-setting breakpoint -18: PC register is not available
> Error in re-setting breakpoint -19: PC register is not available
> Error in re-setting breakpoint -24: PC register is not available
> Error in re-setting breakpoint -25: PC register is not available
> Error in re-setting breakpoint -26: PC register is not available
> Error in re-setting breakpoint -27: PC register is not available
> Error in re-setting breakpoint -28: PC register is not available
> Error in re-setting breakpoint -29: PC register is not available
> Error in re-setting breakpoint -30: PC register is not available
> PC register is not available
> (gdb)
> 

Hmm, gdbserver's logs (for a different run) show:

...
HEW: Got clone event from LWP 25962, new child is LWP 25989
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x1): status(137f), 25990
LWFE: waitpid(-1, ...) returned 25990, ERRNO-OK
LLW: waitpid 25990 received Stopped (signal) (stopped)
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x1): status(0), 25988
LWFE: waitpid(-1, ...) returned 25988, ERRNO-OK
LLW: waitpid 25988 received 0 (exited)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LLFE: 25988 exited.
^^^^^^^^^^^^^^^^^^^
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(1057f), 25973
LWFE: waitpid(-1, ...) returned 25973, ERRNO-OK
LLW: waitpid 25973 received Trace/breakpoint trap (stopped)
pc is 0x3615ebc7cc
HEW: Got fork event from LWP 25973, new child is 25990
pc is 0x3615ebc7cc
pc is 0x3615ebc7cc
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(117f), 25972
LWFE: waitpid(-1, ...) returned 25972, ERRNO-OK
LLW: waitpid 25972 received Child exited (stopped)
pc is 0x3616a0f279
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(117f), 0
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: resuming stopped-resumed LWP LWP 25962.25962 at 3615ef4ce1: step=0
pc is 0x3615ef4ce1
Resuming lwp 25962 (continue, signal 0, stop not expected)
  continue from pc 0x3615ef4ce1
RSRL: resuming stopped-resumed LWP LWP 25962.25989 at 0: step=0
pc is 0x3615ef4ce1
Resuming lwp 25989 (continue, signal 0, stop not expected)
  continue from pc 0x3615ef4ce1
sigchld_handler
Ignored signal 17 for LWP 25972.
pc is 0x3616a0f279
Resuming lwp 25972 (continue, signal 17, stop not expected)
  continue from pc 0x3616a0f279
handling possible target event
>>>> entering linux_wait_1
linux_wait_1: [<all threads>]
Got a pending child 25973
Got an event from pending child 25973 (1057f)
Hit a non-gdbserver trap event.
SEL: Found 2 SIGTRAP events, selecting #1
linux_wait_1 ret = LWP 25988.25988, 1, 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ("1" is TARGET_WAITKIND_STOPPED)
<<<< exiting linux_wait_1
Writing resume reply for LWP 25988.25988:1
ptrace(regsets_fetch_inferior_registers) PID=25988: No such process
ptrace(regsets_fetch_inferior_registers) PID=25988: No such process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events",
it's "two events".  And the one that was picked was a process
exit.  But the tail end of linux_wait_1 isn't expecting that
can happen.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-29 13:38       ` Pedro Alves
@ 2015-07-29 14:23         ` Pedro Alves
  2015-07-29 15:40           ` Pedro Alves
  0 siblings, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-07-29 14:23 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 07/29/2015 02:38 PM, Pedro Alves wrote:

> Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events",
> it's "two events".  And the one that was picked was a process
> exit.  But the tail end of linux_wait_1 isn't expecting that
> can happen.
>

This seems to fix it (and ends up making the code a little
more like linux-nat.c).  I'm running it through the testsuite.

--------
From: Pedro Alves <palves@redhat.com>
Date: 2015-07-29 14:40:04 +0100

fix
---

 gdb/gdbserver/linux-low.c |   67 ++++++++++++++++++++++++++-------------------
 gdb/gdbserver/linux-low.h |   11 +++----
 2 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 9eaa912..99a44f9 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -264,6 +264,7 @@ static int linux_wait_for_event (ptid_t ptid, int *wstat, int options);
 static struct lwp_info *add_lwp (ptid_t ptid);
 static int linux_stopped_by_watchpoint (void);
 static void mark_lwp_dead (struct lwp_info *lwp, int wstat);
+static int lwp_is_marked_dead (struct lwp_info *lwp);
 static void proceed_all_lwps (void);
 static int finish_step_over (struct lwp_info *lwp);
 static int kill_lwp (unsigned long lwpid, int signo);
@@ -755,9 +756,9 @@ add_lwp (ptid_t ptid)
 {
   struct lwp_info *lwp;
 
-  lwp = (struct lwp_info *) xmalloc (sizeof (*lwp));
-  memset (lwp, 0, sizeof (*lwp));
+  lwp = (struct lwp_info *) xcalloc (1, sizeof (*lwp));
 
+  lwp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
   if (the_low_target.new_thread != NULL)
     the_low_target.new_thread (lwp);
 
@@ -1397,7 +1398,7 @@ linux_thread_alive (ptid_t ptid)
      exited but we still haven't been able to report it to GDB, we'll
      hold on to the last lwp of the dead process.  */
   if (lwp != NULL)
-    return !lwp->dead;
+    return !lwp_is_marked_dead (lwp);
   else
     return 0;
 }
@@ -2741,20 +2742,6 @@ ignore_event (struct target_waitstatus *ourstatus)
   return null_ptid;
 }
 
-/* Return non-zero if WAITSTATUS reflects an extended linux
-   event.  Otherwise, return zero.  */
-
-static int
-extended_event_reported (const struct target_waitstatus *waitstatus)
-{
-  if (waitstatus == NULL)
-    return 0;
-
-  return (waitstatus->kind == TARGET_WAITKIND_FORKED
-	  || waitstatus->kind == TARGET_WAITKIND_VFORKED
-	  || waitstatus->kind == TARGET_WAITKIND_VFORK_DONE);
-}
-
 /* Wait for process, returns status.  */
 
 static ptid_t
@@ -3122,7 +3109,7 @@ linux_wait_1 (ptid_t ptid,
 		   || (gdb_breakpoint_here (event_child->stop_pc)
 		       && gdb_condition_true_at_breakpoint (event_child->stop_pc)
 		       && gdb_no_commands_at_breakpoint (event_child->stop_pc))
-		   || extended_event_reported (&event_child->waitstatus));
+		   || event_child->waitstatus.kind != TARGET_WAITKIND_IGNORE);
 
   run_breakpoint_commands (event_child->stop_pc);
 
@@ -3144,9 +3131,11 @@ linux_wait_1 (ptid_t ptid,
 			  paddress (event_child->stop_pc),
 			  paddress (event_child->step_range_start),
 			  paddress (event_child->step_range_end));
-	  if (extended_event_reported (&event_child->waitstatus))
+	  if (event_child->waitstatus.kind != TARGET_WAITKIND_IGNORE)
 	    {
-	      char *str = target_waitstatus_to_string (ourstatus);
+	      char *str;
+
+	      str = target_waitstatus_to_string (&event_child->waitstatus);
 	      debug_printf ("LWP %ld: extended event with waitstatus %s\n",
 			    lwpid_of (get_lwp_thread (event_child)), str);
 	      xfree (str);
@@ -3260,12 +3249,11 @@ linux_wait_1 (ptid_t ptid,
 	unstop_all_lwps (1, event_child);
     }
 
-  if (extended_event_reported (&event_child->waitstatus))
+  if (event_child->waitstatus.kind != TARGET_WAITKIND_IGNORE)
     {
-      /* If the reported event is a fork, vfork or exec, let GDB know.  */
-      ourstatus->kind = event_child->waitstatus.kind;
-      ourstatus->value = event_child->waitstatus.value;
-
+      /* If the reported event is an exit, fork, vfork or exec, let
+	 GDB know.  */
+      *ourstatus = event_child->waitstatus;
       /* Clear the event lwp's waitstatus since we handled it already.  */
       event_child->waitstatus.kind = TARGET_WAITKIND_IGNORE;
     }
@@ -3473,13 +3461,23 @@ suspend_and_send_sigstop_callback (struct inferior_list_entry *entry,
 static void
 mark_lwp_dead (struct lwp_info *lwp, int wstat)
 {
-  /* It's dead, really.  */
-  lwp->dead = 1;
-
   /* Store the exit status for later.  */
   lwp->status_pending_p = 1;
   lwp->status_pending = wstat;
 
+  /* Store in waitstatus as well, as there's nothing else to process
+     for this event.  */
+  if (WIFEXITED (wstat))
+    {
+      lwp->waitstatus.kind = TARGET_WAITKIND_EXITED;
+      lwp->waitstatus.value.integer = WEXITSTATUS (wstat);
+    }
+  else if (WIFSIGNALED (wstat))
+    {
+      lwp->waitstatus.kind = TARGET_WAITKIND_SIGNALLED;
+      lwp->waitstatus.value.sig = gdb_signal_from_host (WTERMSIG (wstat));
+    }
+
   /* Prevent trying to stop it.  */
   lwp->stopped = 1;
 
@@ -3487,6 +3485,17 @@ mark_lwp_dead (struct lwp_info *lwp, int wstat)
   lwp->stop_expected = 0;
 }
 
+/* Return true if LWP has exited already, and has a pending exit event
+   to report to GDB.  */
+
+static int
+lwp_is_marked_dead (struct lwp_info *lwp)
+{
+  return (lwp->status_pending_p
+	  && (WIFEXITED (lwp->status_pending)
+	      || WIFSIGNALED (lwp->status_pending)));
+}
+
 /* Wait for all children to stop for the SIGSTOPs we just queued.  */
 
 static void
@@ -3603,7 +3612,7 @@ lwp_running (struct inferior_list_entry *entry, void *data)
   struct thread_info *thread = (struct thread_info *) entry;
   struct lwp_info *lwp = get_thread_lwp (thread);
 
-  if (lwp->dead)
+  if (lwp_is_marked_dead (lwp))
     return 0;
   if (lwp->stopped)
     return 0;
diff --git a/gdb/gdbserver/linux-low.h b/gdb/gdbserver/linux-low.h
index 3300da9..0f2421b 100644
--- a/gdb/gdbserver/linux-low.h
+++ b/gdb/gdbserver/linux-low.h
@@ -266,16 +266,13 @@ struct lwp_info
      event already received in a wait()).  */
   int stopped;
 
-  /* If this flag is set, the lwp is known to be dead already (exit
-     event already received in a wait(), and is cached in
-     status_pending).  */
-  int dead;
-
   /* When stopped is set, the last wait status recorded for this lwp.  */
   int last_status;
 
-  /* This is used to store extended ptrace event information until
-     it is reported to GDB.  */
+  /* If WAITSTATUS->KIND != TARGET_WAITKIND_IGNORE, the waitstatus for
+     this LWP's last event, to pass to GDB without any further
+     processing.  This is used to store extended ptrace event
+     information or exit status until it can be reported to GDB.  */
   struct target_waitstatus waitstatus;
 
   /* When stopped is set, this is where the lwp last stopped, with

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-29 14:23         ` Pedro Alves
@ 2015-07-29 15:40           ` Pedro Alves
  2015-07-29 16:40             ` Pedro Alves
  0 siblings, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-07-29 15:40 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 07/29/2015 03:23 PM, Pedro Alves wrote:
> On 07/29/2015 02:38 PM, Pedro Alves wrote:
> 
>> Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events",
>> it's "two events".  And the one that was picked was a process
>> exit.  But the tail end of linux_wait_1 isn't expecting that
>> can happen.
>>
> 
> This seems to fix it (and ends up making the code a little
> more like linux-nat.c).  I'm running it through the testsuite.

It passes cleanly.

And I confirmed that that test idea triggers the suspend count
assertion I suspected:

Child exited with status 0
/home/pedro/gdb/mygit/build/../src/gdb/gdbserver/linux-low.c:2619: A problem internal to GDBserver has been detected.
unsuspend_one_lwp: Assertion `lwp->suspended >= 0' failed.
[Inferior 1 (process 32473) exited with code 01]
(gdb)

I'll convert that to a proper test.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-29 15:40           ` Pedro Alves
@ 2015-07-29 16:40             ` Pedro Alves
  0 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-29 16:40 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 07/29/2015 04:40 PM, Pedro Alves wrote:
> On 07/29/2015 03:23 PM, Pedro Alves wrote:
>> On 07/29/2015 02:38 PM, Pedro Alves wrote:
>>
>>> Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events",
>>> it's "two events".  And the one that was picked was a process
>>> exit.  But the tail end of linux_wait_1 isn't expecting that
>>> can happen.
>>>
>>
>> This seems to fix it (and ends up making the code a little
>> more like linux-nat.c).  I'm running it through the testsuite.
> 
> It passes cleanly.
> 
> And I confirmed that that test idea triggers the suspend count
> assertion I suspected:
> 
> Child exited with status 0
> /home/pedro/gdb/mygit/build/../src/gdb/gdbserver/linux-low.c:2619: A problem internal to GDBserver has been detected.
> unsuspend_one_lwp: Assertion `lwp->suspended >= 0' failed.
> [Inferior 1 (process 32473) exited with code 01]
> (gdb)
> 
> I'll convert that to a proper test.
> 

And that that exposes more issues, like:

[New Thread 27183.27382]
[New Thread 27183.27684]
/home/pedro/gdb/mygit/build/../src/gdb/thread.c:936: internal-error: finish_thread_state: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/fork-plus-threads-2.exp: inferior 1 exited (GDB internal error)
Resyncing due to internal error.
n

...

Detaching from process 28486
Detaching from process 28487
Detaching from process 28488
/home/pedro/gdb/mygit/build/../src/gdb/gdbserver/linux-low.c:3569: A problem internal to GDBserver has been detected.
stuck_in_jump_pad_callback: Assertion `lwp->suspended == 0' failed.
testcase /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/fork-plus-threads-2.exp completed in 3 seconds

                === gdb Summary ===

Looks like I found myself a nice deep rat hole...

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pushed] Re: [PATCH v2 1/2] PR threads/18600: Threads left stopped after fork+thread spawn
  2015-07-23 17:25 ` [PATCH v2 1/2] PR threads/18600: Threads left stopped " Pedro Alves
@ 2015-07-30 18:08   ` Pedro Alves
  0 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2015-07-30 18:08 UTC (permalink / raw)
  To: gdb-patches

On 07/23/2015 06:24 PM, Pedro Alves wrote:
> When a program forks and another process start threads while gdb is
> handling the fork event, newly created threads are left stuck stopped
> by gdb, even though gdb presents them as "running", to the user.
> 
> This can be seen with the test added by this patch.  The test has the
> inferior fork a certain number of times and waits for all children to
> exit.  Each fork child spawns a number of threads that do nothing and
> joins them immediately.  Normally, the program should run unimpeded
> (from the point of view of the user) and exit very quickly.  Without
> this fix, it doesn't because of some threads left stopped by gdb, so
> inferior 1 never exits.
> 
> The program triggers when a new clone thread is found while inside the
> linux_stop_and_wait_all_lwps call in linux-thread-db.c:
> 
>       linux_stop_and_wait_all_lwps ();
> 
>       ALL_LWPS (lp)
> 	if (ptid_get_pid (lp->ptid) == pid)
> 	  thread_from_lwp (lp->ptid);
> 
>       linux_unstop_all_lwps ();
> 
> Within linux_stop_and_wait_all_lwps, we reach
> linux_handle_extended_wait with the "stopping" parameter set to 1, and
> because of that we don't mark the new lwp as resumed.  As consequence,
> the subsequent resume_stopped_resumed_lwps, called from
> linux_unstop_all_lwps, never resumes the new LWP.
> 
> There's lots of cruft in linux_handle_extended_wait that no longer
> makes sense.  On systems with CLONE events support, we don't rely on
> libthread_db for thread listing anymore, so the code that preserves
> stop_requested and the handling of last_resume_kind is all dead.
> 
> So the fix is to remove all that, and simply always mark the new LWP
> as resumed, so that resume_stopped_resumed_lwps re-resumes it.
> 
> gdb/ChangeLog:
> 2015-07-23  Pedro Alves  <palves@redhat.com>
> 	    Simon Marchi  <simon.marchi@ericsson.com>
> 
> 	PR threads/18600
> 	* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
> 	mark the new thread as resumed.  Remove STOPPING parameter.
> 	(wait_lwp): Adjust call to linux_handle_extended_wait.
> 	(linux_nat_filter_event): Adjust call to
> 	linux_handle_extended_wait.
> 	(resume_stopped_resumed_lwps): Add debug output.
> 
> gdb/testsuite/ChangeLog:
> 2015-07-23  Simon Marchi  <simon.marchi@ericsson.com>
> 	    Pedro Alves  <palves@redhat.com>
> 
> 	PR threads/18600
> 	* gdb.threads/fork-plus-threads.c: New file.
> 	* gdb.threads/fork-plus-threads.exp: New file.

Here's what I pushed, to both master and 7.10 branch.
Mostly the same as before, but I made the test check both
set detach-on-fork on/off.

From 4dd63d488a76482543517c4c4cde699ee6fa33ef Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 30 Jul 2015 18:50:29 +0100
Subject: [PATCH 1/3] PR threads/18600: Threads left stopped after fork+thread
 spawn

When a program forks and another process start threads while gdb is
handling the fork event, newly created threads are left stuck stopped
by gdb, even though gdb presents them as "running", to the user.

This can be seen with the test added by this patch.  The test has the
inferior fork a certain number of times and waits for all children to
exit.  Each fork child spawns a number of threads that do nothing and
joins them immediately.  Normally, the program should run unimpeded
(from the point of view of the user) and exit very quickly.  Without
this fix, it doesn't because of some threads left stopped by gdb, so
inferior 1 never exits.

The program triggers when a new clone thread is found while inside the
linux_stop_and_wait_all_lwps call in linux-thread-db.c:

      linux_stop_and_wait_all_lwps ();

      ALL_LWPS (lp)
	if (ptid_get_pid (lp->ptid) == pid)
	  thread_from_lwp (lp->ptid);

      linux_unstop_all_lwps ();

Within linux_stop_and_wait_all_lwps, we reach
linux_handle_extended_wait with the "stopping" parameter set to 1, and
because of that we don't mark the new lwp as resumed.  As consequence,
the subsequent resume_stopped_resumed_lwps, called from
linux_unstop_all_lwps, never resumes the new LWP.

There's lots of cruft in linux_handle_extended_wait that no longer
makes sense.  On systems with CLONE events support, we don't rely on
libthread_db for thread listing anymore, so the code that preserves
stop_requested and the handling of last_resume_kind is all dead.

So the fix is to remove all that, and simply always mark the new LWP
as resumed, so that resume_stopped_resumed_lwps re-resumes it.

gdb/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>
	    Simon Marchi  <simon.marchi@ericsson.com>

	PR threads/18600
	* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
	mark the new thread as resumed.  Remove STOPPING parameter.
	(wait_lwp): Adjust call to linux_handle_extended_wait.
	(linux_nat_filter_event): Adjust call to
	linux_handle_extended_wait.
	(resume_stopped_resumed_lwps): Add debug output.

gdb/testsuite/ChangeLog:
2015-07-30  Simon Marchi  <simon.marchi@ericsson.com>
	    Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* gdb.threads/fork-plus-threads.c: New file.
	* gdb.threads/fork-plus-threads.exp: New file.
---
 gdb/ChangeLog                                   |  11 +++
 gdb/testsuite/ChangeLog                         |   7 ++
 gdb/linux-nat.c                                 |  97 +++++++++-----------
 gdb/testsuite/gdb.threads/fork-plus-threads.c   | 115 ++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp |  69 ++++++++++++++
 5 files changed, 243 insertions(+), 56 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/fork-plus-threads.exp

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 4d604de..40403f9 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,14 @@
+2015-07-30  Pedro Alves  <palves@redhat.com>
+	    Simon Marchi  <simon.marchi@ericsson.com>
+
+	PR threads/18600
+	* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
+	mark the new thread as resumed.  Remove STOPPING parameter.
+	(wait_lwp): Adjust call to linux_handle_extended_wait.
+	(linux_nat_filter_event): Adjust call to
+	linux_handle_extended_wait.
+	(resume_stopped_resumed_lwps): Add debug output.
+
 2015-07-30  Pierre Langlois  <pierre.langlois@arm.com>
 
 	* arch-utils.c (default_fast_tracepoint_valid_at): Remove unused
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 171784e..06ca987 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2015-07-30  Simon Marchi  <simon.marchi@ericsson.com>
+	    Pedro Alves  <palves@redhat.com>
+
+	PR threads/18600
+	* gdb.threads/fork-plus-threads.c: New file.
+	* gdb.threads/fork-plus-threads.exp: New file.
+
 2015-07-29  Patrick Palka  <patrick@parcs.ath.cx>
 
 	* gdb.base/batch-preserve-term-settings.exp
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index b33abb0..966c6a8 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2000,8 +2000,7 @@ linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
    true, the new LWP remains stopped, otherwise it is continued.  */
 
 static int
-linux_handle_extended_wait (struct lwp_info *lp, int status,
-			    int stopping)
+linux_handle_extended_wait (struct lwp_info *lp, int status)
 {
   int pid = ptid_get_lwp (lp->ptid);
   struct target_waitstatus *ourstatus = &lp->waitstatus;
@@ -2071,7 +2070,7 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	ourstatus->kind = TARGET_WAITKIND_FORKED;
       else if (event == PTRACE_EVENT_VFORK)
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
-      else
+      else if (event == PTRACE_EVENT_CLONE)
 	{
 	  struct lwp_info *new_lp;
 
@@ -2086,43 +2085,7 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	  new_lp = add_lwp (ptid_build (ptid_get_pid (lp->ptid), new_pid, 0));
 	  new_lp->cloned = 1;
 	  new_lp->stopped = 1;
-
-	  if (WSTOPSIG (status) != SIGSTOP)
-	    {
-	      /* This can happen if someone starts sending signals to
-		 the new thread before it gets a chance to run, which
-		 have a lower number than SIGSTOP (e.g. SIGUSR1).
-		 This is an unlikely case, and harder to handle for
-		 fork / vfork than for clone, so we do not try - but
-		 we handle it for clone events here.  We'll send
-		 the other signal on to the thread below.  */
-
-	      new_lp->signalled = 1;
-	    }
-	  else
-	    {
-	      struct thread_info *tp;
-
-	      /* When we stop for an event in some other thread, and
-		 pull the thread list just as this thread has cloned,
-		 we'll have seen the new thread in the thread_db list
-		 before handling the CLONE event (glibc's
-		 pthread_create adds the new thread to the thread list
-		 before clone'ing, and has the kernel fill in the
-		 thread's tid on the clone call with
-		 CLONE_PARENT_SETTID).  If that happened, and the core
-		 had requested the new thread to stop, we'll have
-		 killed it with SIGSTOP.  But since SIGSTOP is not an
-		 RT signal, it can only be queued once.  We need to be
-		 careful to not resume the LWP if we wanted it to
-		 stop.  In that case, we'll leave the SIGSTOP pending.
-		 It will later be reported as GDB_SIGNAL_0.  */
-	      tp = find_thread_ptid (new_lp->ptid);
-	      if (tp != NULL && tp->stop_requested)
-		new_lp->last_resume_kind = resume_stop;
-	      else
-		status = 0;
-	    }
+	  new_lp->resumed = 1;
 
 	  /* If the thread_db layer is active, let it record the user
 	     level thread id and status, and add the thread to GDB's
@@ -2136,19 +2099,23 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	    }
 
 	  /* Even if we're stopping the thread for some reason
-	     internal to this module, from the user/frontend's
-	     perspective, this new thread is running.  */
+	     internal to this module, from the perspective of infrun
+	     and the user/frontend, this new thread is running until
+	     it next reports a stop.  */
 	  set_running (new_lp->ptid, 1);
-	  if (!stopping)
-	    {
-	      set_executing (new_lp->ptid, 1);
-	      /* thread_db_attach_lwp -> lin_lwp_attach_lwp forced
-		 resume_stop.  */
-	      new_lp->last_resume_kind = resume_continue;
-	    }
+	  set_executing (new_lp->ptid, 1);
 
-	  if (status != 0)
+	  if (WSTOPSIG (status) != SIGSTOP)
 	    {
+	      /* This can happen if someone starts sending signals to
+		 the new thread before it gets a chance to run, which
+		 have a lower number than SIGSTOP (e.g. SIGUSR1).
+		 This is an unlikely case, and harder to handle for
+		 fork / vfork than for clone, so we do not try - but
+		 we handle it for clone events here.  */
+
+	      new_lp->signalled = 1;
+
 	      /* We created NEW_LP so it cannot yet contain STATUS.  */
 	      gdb_assert (new_lp->status == 0);
 
@@ -2162,7 +2129,6 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
 	      new_lp->status = status;
 	    }
 
-	  new_lp->resumed = !stopping;
 	  return 1;
 	}
 
@@ -2353,7 +2319,7 @@ wait_lwp (struct lwp_info *lp)
 	fprintf_unfiltered (gdb_stdlog,
 			    "WL: Handling extended status 0x%06x\n",
 			    status);
-      linux_handle_extended_wait (lp, status, 1);
+      linux_handle_extended_wait (lp, status);
       return 0;
     }
 
@@ -3155,7 +3121,7 @@ linux_nat_filter_event (int lwpid, int status)
 	fprintf_unfiltered (gdb_stdlog,
 			    "LLW: Handling extended status 0x%06x\n",
 			    status);
-      if (linux_handle_extended_wait (lp, status, 0))
+      if (linux_handle_extended_wait (lp, status))
 	return NULL;
     }
 
@@ -3675,9 +3641,28 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
 {
   ptid_t *wait_ptid_p = data;
 
-  if (lp->stopped
-      && lp->resumed
-      && !lwp_status_pending_p (lp))
+  if (!lp->stopped)
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, not stopped\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else if (!lp->resumed)
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, not resumed\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else if (lwp_status_pending_p (lp))
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "RSRL: NOT resuming LWP %s, has pending status\n",
+			    target_pid_to_str (lp->ptid));
+    }
+  else
     {
       struct regcache *regcache = get_thread_regcache (lp->ptid);
       struct gdbarch *gdbarch = get_regcache_arch (regcache);
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.c b/gdb/testsuite/gdb.threads/fork-plus-threads.c
new file mode 100644
index 0000000..780a4b8
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.c
@@ -0,0 +1,115 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+
+/* Number of times the main process forks.  */
+#define NFORKS 10
+
+/* Number of threads by each fork child.  */
+#define NTHREADS 10
+
+static void *
+thread_func (void *arg)
+{
+  /* Empty.  */
+}
+
+static void
+fork_child (void)
+{
+  pthread_t threads[NTHREADS];
+  int i;
+  int ret;
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      ret = pthread_create (&threads[i], NULL, thread_func, NULL);
+      assert (ret == 0);
+    }
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      ret = pthread_join (threads[i], NULL);
+      assert (ret == 0);
+    }
+}
+
+int
+main (void)
+{
+  pid_t childs[NFORKS];
+  int i;
+  int status;
+  int num_exited = 0;
+
+  /* Don't run forever if the wait loop below gets stuck.  */
+  alarm (180);
+
+  for (i = 0; i < NFORKS; i++)
+    {
+      pid_t pid;
+
+      pid = fork ();
+
+      if (pid > 0)
+	{
+	  /* Parent.  */
+	  childs[i] = pid;
+	}
+      else if (pid == 0)
+	{
+	  /* Child.  */
+	  fork_child ();
+	  return 0;
+	}
+      else
+	{
+	  perror ("fork");
+	  return 1;
+	}
+    }
+
+  while (num_exited != NFORKS)
+    {
+      pid_t pid = wait (&status);
+
+      if (pid == -1)
+	{
+	  perror ("wait");
+	  return 1;
+	}
+
+      if (WIFEXITED (status))
+	{
+	  num_exited++;
+	}
+      else
+	{
+	  printf ("Hmm, unexpected wait status 0x%x from child %d\n", status,
+		  pid);
+	}
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
new file mode 100644
index 0000000..53d1102
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -0,0 +1,69 @@
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test verifies that threads created by the child fork are
+# properly handled.  Specifically, GDB used to have a bug where it
+# would leave child fork threads stuck stopped, even though "info
+# threads" would show them running.
+#
+# See https://sourceware.org/bugzilla/show_bug.cgi?id=18600
+
+standard_testfile
+
+proc do_test { detach_on_fork } {
+    global GDBFLAGS
+    global srcfile testfile
+    global gdb_prompt
+
+    set saved_gdbflags $GDBFLAGS
+    set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
+
+    if {[prepare_for_testing "failed to prepare" \
+	     $testfile $srcfile {debug pthreads}] == -1} {
+	set GDBFLAGS $saved_gdbflags
+	return -1
+    }
+
+    set GDBFLAGS $saved_gdbflags
+
+    if ![runto_main] then {
+	fail "Can't run to main"
+	return 0
+    }
+
+    gdb_test_no_output "set detach-on-fork $detach_on_fork"
+    set test "continue &"
+    gdb_test_multiple $test $test {
+	-re "$gdb_prompt " {
+	    pass $test
+	}
+    }
+
+    set test "inferior 1 exited"
+    gdb_test_multiple "" $test {
+	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
+	    pass $test
+	}
+    }
+
+    gdb_test "info threads" "No threads\." \
+	"no threads left"
+}
+
+foreach detach_on_fork {"on" "off"} {
+    with_test_prefix "detach-on-fork=$detach_on_fork" {
+	do_test $detach_on_fork
+    }
+}
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-29 13:21     ` Pedro Alves
  2015-07-29 13:38       ` Pedro Alves
@ 2015-07-30 18:13       ` Pedro Alves
  2015-07-30 18:15         ` Simon Marchi
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-07-30 18:13 UTC (permalink / raw)
  To: Don Breazeal, Breazeal, Don, Simon Marchi; +Cc: GDB Patches

On 07/29/2015 02:21 PM, Pedro Alves wrote:
>> > I have a question about this.  In the definition of struct lwp_info in
>> > linux-low.h, it has this comment:
>> > 
>> >   /* When this is true, we shall not try to resume this thread, even
>> >      if last_resume_kind isn't resume_stop.  */
>> >   int suspended;
>> > 
>> > Since we are setting last_resume_kind to resume_stop here, is this
>> > unnecessary?
> We still need it, because otherwise we'd decrement the suspend count
> below 0:

I think that today I fixed most of the issues this uncovered that I showed
yesterday, but there's still more to do, so I removed that bit from the
patch, and pushed it, to both master and 7.10, in order to close PR18600,
and remove it from the 7.10 blockers list.  Not sure yet whether the
other new fixes will make it into 7.10.

Below's what I pushed.

-----------
From 998d452ac81bc240996c967dd27f7b747240cd66 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 30 Jul 2015 18:41:44 +0100
Subject: [PATCH 3/3] remote follow fork and spurious child stops in non-stop
 mode

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] #2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
---
 gdb/testsuite/ChangeLog                         |  5 ++++
 gdb/gdbserver/linux-low.c                       |  2 ++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp | 32 +++++++++++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index e3126ed..eda6625 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,5 +1,10 @@
 2015-07-30  Pedro Alves  <palves@redhat.com>
 
+	* linux-low.c (handle_extended_wait): Set the child's last
+	reported status to TARGET_WAITKIND_STOPPED.
+
+2015-07-30  Pedro Alves  <palves@redhat.com>
+
 	PR threads/18600
 	* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
 	only shows inferior 1.
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 9bc9fa3..82fb7f9 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -483,6 +483,8 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
 	  child_lwp->status_pending_p = 0;
 	  child_thr = get_lwp_thread (child_lwp);
 	  child_thr->last_resume_kind = resume_stop;
+	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
+
 	  parent_proc = get_thread_process (event_thr);
 	  child_proc->attached = parent_proc->attached;
 	  clone_all_breakpoints (&child_proc->breakpoints,
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
index 8a503ec..2b34b6c 100644
--- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -51,13 +51,45 @@ proc do_test { detach_on_fork } {
 	}
     }
 
+    # gdbserver had a bug that resulted in reporting the fork child's
+    # initial stop to gdb, which gdb does not expect, in turn
+    # resulting in a broken session, like:
+    #
+    #  [Thread 31536.31536] #16 stopped.                                <== BAD
+    #  [New Thread 31547.31547]
+    #  [Inferior 10 (process 31536) exited normally]
+    #  [New Thread 31547.31560]
+    #
+    #  [Thread 31547.31547] #18 stopped.                                <== BAD
+    #  Cannot remove breakpoints because program is no longer writable. <== BAD
+    #  Further execution is probably impossible.                        <== BAD
+    #  [Inferior 11 (process 31547) exited normally]
+    #  [Inferior 1 (process 31454) exited normally]
+    #
+    # These variables track whether we see such broken behavior.
+    set saw_cannot_remove_breakpoints 0
+    set saw_thread_stopped 0
+
     set test "inferior 1 exited"
     gdb_test_multiple "" $test {
+	-re "Cannot remove breakpoints" {
+	    set saw_cannot_remove_breakpoints 1
+	    exp_continue
+	}
+	-re "Thread \[^\r\n\]+ stopped\\." {
+	    set saw_thread_stopped 1
+	    exp_continue
+	}
 	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
 	    pass $test
 	}
     }
 
+    gdb_assert !$saw_cannot_remove_breakpoints \
+	"no failure to remove breakpoints"
+    gdb_assert !$saw_thread_stopped \
+	"no spurious thread stop"
+
     gdb_test "info threads" "No threads\." \
 	"no threads left"
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-30 18:13       ` Pedro Alves
@ 2015-07-30 18:15         ` Simon Marchi
  2015-07-30 21:06           ` Don Breazeal
  0 siblings, 1 reply; 16+ messages in thread
From: Simon Marchi @ 2015-07-30 18:15 UTC (permalink / raw)
  To: Pedro Alves, Don Breazeal, Breazeal, Don; +Cc: GDB Patches

On 15-07-30 02:13 PM, Pedro Alves wrote:
> I think that today I fixed most of the issues this uncovered that I showed
> yesterday, but there's still more to do, so I removed that bit from the
> patch, and pushed it, to both master and 7.10, in order to close PR18600,
> and remove it from the 7.10 blockers list.  Not sure yet whether the
> other new fixes will make it into 7.10.

Thanks a lot for looking into this!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
  2015-07-30 18:15         ` Simon Marchi
@ 2015-07-30 21:06           ` Don Breazeal
  0 siblings, 0 replies; 16+ messages in thread
From: Don Breazeal @ 2015-07-30 21:06 UTC (permalink / raw)
  To: Simon Marchi, Pedro Alves; +Cc: GDB Patches

On 7/30/2015 11:15 AM, Simon Marchi wrote:
> On 15-07-30 02:13 PM, Pedro Alves wrote:
>> I think that today I fixed most of the issues this uncovered that I showed
>> yesterday, but there's still more to do, so I removed that bit from the
>> patch, and pushed it, to both master and 7.10, in order to close PR18600,
>> and remove it from the 7.10 blockers list.  Not sure yet whether the
>> other new fixes will make it into 7.10.
> 
> Thanks a lot for looking into this!
> 
Yes, thank you for cleaning this up, Pedro!
--Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-07-30 21:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-23 17:24 [PATCH v2 0/2] PR threads/18600: Threads left stopped after fork+thread spawn Pedro Alves
2015-07-23 17:25 ` [PATCH v2 2/2] PR threads/18600: Inferiors left around " Pedro Alves
2015-07-23 17:25 ` [PATCH v2 1/2] PR threads/18600: Threads left stopped " Pedro Alves
2015-07-30 18:08   ` [pushed] " Pedro Alves
2015-07-23 18:21 ` [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode Pedro Alves
2015-07-24 18:05   ` Simon Marchi
2015-07-24 18:17     ` Pedro Alves
2015-07-24 18:43   ` Don Breazeal
2015-07-29 13:21     ` Pedro Alves
2015-07-29 13:38       ` Pedro Alves
2015-07-29 14:23         ` Pedro Alves
2015-07-29 15:40           ` Pedro Alves
2015-07-29 16:40             ` Pedro Alves
2015-07-30 18:13       ` Pedro Alves
2015-07-30 18:15         ` Simon Marchi
2015-07-30 21:06           ` Don Breazeal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).