[PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt
  2016-01-28  0:48 [PATCH 0/3] PR remote/19496, remote fork failures Don Breazeal
  2016-01-28  0:48 ` [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Don Breazeal
  2016-01-28  0:48 ` [PATCH 3/3] PR remote/19496, timeout " Don Breazeal
@ 2016-01-28  0:48 ` Don Breazeal
  2016-02-01 19:26   ` [pushed] Test gdb.threads/forking-threads-plus-breakpoint.exp with, displaced stepping off (Re: [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt) Pedro Alves
  2016-02-01 20:38   ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Pedro Alves
  2 siblings, 2 replies; 29+ messages in thread
From: Don Breazeal @ 2016-01-28  0:48 UTC (permalink / raw)
  To: gdb-patches

This patch fixes an internal error that occurs in
gdb.threads/forking-threads-plus-breakpoint.exp:

/blah/binutils-gdb/gdb/target.c:2723: internal-error: Can't determine the
current address space of thread Thread 3170.3170

In default_thread_address_space, find_inferior_ptid couldn't find 3170.3170
because it had been overwritten in inferior_appeared, called as follows:

inferior_appeared
  remote_add_inferior
    remote_notice_new_inferior
      remote_update_thread_list

The cause of the problem was the following sequence of events:

* GDB knows only about the main thread

* the first fork event is reported to GDB, saved as pending_event

* qXfer:threads_read gets the threads from the remote.
  remove_new_fork_children id's the fork child from the pending event
  and removes it from the list reported to GDB.  All the rest of the
  threads, including the fork parent, are added to the GDB thread list.

* GDB stops all the threads.  All the stop events are pushed onto the
  stop reply queue behind the pending fork event.

* remote_wait_ns calls queued_stop_reply and process_stop_reply to
  remove the fork event from the front of the stop reply queue and save
  event information in the thread_info structure for the fork parent
  thread.  Unfortunately, none of the information saved in this way is
  the fork-specific information, so the actual fork event info is lost.

* A subsequent qXfer:threads:read packet gets the thread list including
  the fork parent and fork child.  remove_new_fork_children checks the
  thread list to see if there is a fork parent, doesn't find one, checks
  the stop reply queue for a pending fork event, doesn't find one, and
  allows the fork child thread to be reported to GDB before the fork
  event has been handled.  remote_update_thread_list calls
  remote_notice_new_thread and overwrites the current (main) thread in
  inferior_appeared.  GDB has now lost all knowledge of the main thread,
  and an internal error results.

The fix was to make sure that when the stop reply was removed from the
stop reply queuei, all of the necessary fork event information was stored
in the parent thread structure.  In process_stop_reply we call a new
function, update_thread_if_fork_parent, to store the pending_follow
information from the fork stop reply in the fork parent thread.

Tested on x86_64 and Nios II Linux.  No regressions, but more failures,
which are addressed in subsequent patches in this patchset.

Thanks,
--Don

gdb/ChangeLog:
2016-01-27  Don Breazeal  <donb@codesourcery.com>

	PR remote/19496
	* remote.c (update_thread_if_fork_parent): New function.
	(process_stop_reply): Call update_thread_if_fork_parent.

---
 gdb/remote.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index b0303f6..f072ce4 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6799,6 +6799,25 @@ remote_notif_get_pending_events (struct notif_client *nc)
     }
 }

+/* Check if the specified stop reply is for a fork event.  If it is,
+   update the corresponding thread to contain the pending follow
+   information required to identify it as the fork parent.  */
+
+static void
+update_thread_if_fork_parent (struct stop_reply *stop_reply)
+{
+  ptid_t ptid;
+
+  ptid = stop_reply->ptid;
+  if (stop_reply->ws.kind == TARGET_WAITKIND_FORKED
+      || stop_reply->ws.kind == TARGET_WAITKIND_VFORKED)
+    {
+      struct thread_info *tp = find_thread_ptid (ptid);
+
+      tp->pending_follow = stop_reply->ws;
+    }
+}
+
 /* Called when it is decided that STOP_REPLY holds the info of the
    event that is to be returned to the core.  This function always
    destroys STOP_REPLY.  */
@@ -6844,8 +6863,11 @@ process_stop_reply (struct stop_reply *stop_reply,
       remote_thr->core = stop_reply->core;
       remote_thr->stop_reason = stop_reply->stop_reason;
       remote_thr->watch_data_address = stop_reply->watch_data_address;
-    }

+      /* Make sure we record any pending fork events.  */
+      update_thread_if_fork_parent (stop_reply);
+
+    }
   stop_reply_xfree (stop_reply);
   return ptid;
 }
-- 
1.8.1.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/3] PR remote/19496, remote fork failures
@ 2016-01-28  0:48 Don Breazeal
  2016-01-28  0:48 ` [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Don Breazeal
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Don Breazeal @ 2016-01-28  0:48 UTC (permalink / raw)
  To: gdb-patches

This patchset addresses several failures in
gdb.threads/forking-threads-plus-breakpoint.exp with a Nios II Linux
target, as described in PR remote/19496.

The patches address the first three issues listed in the problem report.
There are still some intermittent failures, but they are infrequent and
as a result will probably take some time to track down.  Since this
patchset fixes the majority of the failures, it seemed to make sense to
get it pushed in now and deal with any remaining failures later.

Patch 1/3: fixes "internal error --  gdb/target.c:2713: internal-error:
Can't determine the current address space of Thread".

Patch 2/3: fixes interrupted system calls (fork and waitpid) in the test
program.

Patch 3/3: fixes timeout failures waiting for the inferior to exit.

The patches were tested on x86_64 Linux with native, native-gdbserver,
and native-extended-gdbserver, and on a Nios II Linux target with an
x86 Linux host.

Thanks,
--Don

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-01-28  0:48 [PATCH 0/3] PR remote/19496, remote fork failures Don Breazeal
  2016-01-28  0:48 ` [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Don Breazeal
@ 2016-01-28  0:48 ` Don Breazeal
  2016-02-01 12:05   ` Pedro Alves
  2016-01-28  0:48 ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Don Breazeal
  2 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-01-28  0:48 UTC (permalink / raw)
  To: gdb-patches

This patch addresses a failure in
gdb.threads/forking-threads-plus-breakpoint.exp:

FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
detach_on_fork=on: inferior 1 exited (timeout)

Cause:
A fork event was reported to GDB before GDB knew about the parent thread,
followed immediately by a breakpoint event in a different thread.  The
parent thread was subsequently added via remote_notice_new_inferior in
process_stop_reply, but when the thread was added the thread_info.state
was set to THREAD_STOPPED.  The fork event was then handled correctly,
but when the fork parent was resumed via a call to keep_going, the state
was unchanged.

The breakpoint event was then handled, which caused all the non-breakpoint
threads to be stopped.  When the breakpoint thread was resumed, all the
non-breakpoint threads were resumed via infrun.c:restart_threads.  Our old
fork parent wasn't restarted, because it still had thread_info.state set to
THREAD_STOPPED.  Ultimately the program under debug hung waiting for a
pthread_join while the old fork parent was stopped forever by GDB.

Fix:
Add a call to set_running just before resuming the fork parent in
infrun.c:handle_inferior_event_1.  The call could be put elsewhere
(e.g. in keep_going) but this approach makes the change for the case
of resuming after a fork event, without potential side-effects for
other cases where keep_going is called.

Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.

Thanks,
--Don

gdb/ChangeLog:
2016-01-27  Don Breazeal  <donb@codesourcery.com>

	PR remote/19496
	* infrun.c (handle_inferior_event_1): Call set_running before
	resuming.

---
 gdb/infrun.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 15210c9..e324864 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5232,7 +5232,13 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
 	  ecs->ptid = inferior_ptid;

 	  if (should_resume)
-	    keep_going (ecs);
+	    {
+	      /* Make sure the thread is marked as running.  If the event
+		 occurred in the thread before we added it, it may have
+		 never been marked running.  */
+	      set_running (inferior_ptid, 1);
+	      keep_going (ecs);
+	    }
 	  else
 	    stop_waiting (ecs);
 	  return;
-- 
1.8.1.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-01-28  0:48 [PATCH 0/3] PR remote/19496, remote fork failures Don Breazeal
@ 2016-01-28  0:48 ` Don Breazeal
  2016-02-01 19:38   ` Pedro Alves
  2016-01-28  0:48 ` [PATCH 3/3] PR remote/19496, timeout " Don Breazeal
  2016-01-28  0:48 ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Don Breazeal
  2 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-01-28  0:48 UTC (permalink / raw)
  To: gdb-patches

This patch addresses "fork:Interrupted system call" (or wait:) failures
in gdb.threads/forking-threads-plus-breakpoint.exp.

The test program spawns ten threads, each of which do ten fork/waitpid
sequences.  The cause of the problem was that when one of the fork
children exited before the corresponding fork parent could initiate its
waitpid for that child, a SIGCHLD was delivered and interrupted a fork
or waitpid in another thread.

The fix was to wrap the system calls in a loop to retry the call if
it was interrupted, like:

do
  {
    pid = fork ();
  }
while (pid == -1 && errno == EINTR);

Since this is a Linux-only test I figure it is OK to use errno and EINTR.

Tested on Nios II Linux target with x86 Linux host.

Thanks,
--Don

gdb/testsuite/ChangeLog:
2016-01-27  Don Breazeal  <donb@codesourcery.com>

	PR remote/19496
	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
	Retry fork and waitpid on interrupted system call errors.

---
 .../gdb.threads/forking-threads-plus-breakpoint.c  |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
index fc64d93..c169e18 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
@@ -22,6 +22,7 @@
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <stdlib.h>
+#include <errno.h>
 
 /* Number of threads.  Each thread continuously spawns a fork and wait
    for it.  If we have another thread continuously start a step over,
@@ -49,14 +50,23 @@ thread_forks (void *arg)
     {
       pid_t pid;
 
-      pid = fork ();
+      do
+	{
+	  pid = fork ();
+	}
+      while (pid == -1 && errno == EINTR);
 
       if (pid > 0)
 	{
 	  int status;
 
 	  /* Parent.  */
-	  pid = waitpid (pid, &status, 0);
+	  do
+	    {
+	      pid = waitpid (pid, &status, 0);
+	    }
+	  while (pid == -1 && errno == EINTR);
+
 	  if (pid == -1)
 	    {
 	      perror ("wait");
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-01-28  0:48 ` [PATCH 3/3] PR remote/19496, timeout " Don Breazeal
@ 2016-02-01 12:05   ` Pedro Alves
  2016-02-01 19:29     ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-02-01 12:05 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 01/28/2016 12:48 AM, Don Breazeal wrote:

> 
> ---
>  gdb/infrun.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 15210c9..e324864 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -5232,7 +5232,13 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>  	  ecs->ptid = inferior_ptid;
>  
>  	  if (should_resume)
> -	    keep_going (ecs);
> +	    {
> +	      /* Make sure the thread is marked as running.  If the event
> +		 occurred in the thread before we added it, it may have
> +		 never been marked running.  */
> +	      set_running (inferior_ptid, 1);
> +	      keep_going (ecs);
> +	    }
>  	  else
>  	    stop_waiting (ecs);
>  	  return;
> 

Can you check whether this is still needed in current master?

Specifically, whether a2077e254098 ([PATCH] Fix PR 19461: strange "info thread"
behavior in non-stop) already fixed this?

See:
 https://sourceware.org/ml/gdb-patches/2016-01/msg00446.html
 https://sourceware.org/bugzilla/show_bug.cgi?id=19461

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pushed] Test gdb.threads/forking-threads-plus-breakpoint.exp with, displaced stepping off (Re: [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt)
  2016-01-28  0:48 ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Don Breazeal
@ 2016-02-01 19:26   ` Pedro Alves
  2016-02-01 20:38   ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Pedro Alves
  1 sibling, 0 replies; 29+ messages in thread
From: Pedro Alves @ 2016-02-01 19:26 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 01/28/2016 12:48 AM, Don Breazeal wrote:

> The cause of the problem was the following sequence of events:
> 
> * GDB knows only about the main thread
> 
> * the first fork event is reported to GDB, saved as pending_event
> 
> * qXfer:threads_read gets the threads from the remote.
>   remove_new_fork_children id's the fork child from the pending event
>   and removes it from the list reported to GDB.  All the rest of the
>   threads, including the fork parent, are added to the GDB thread list.
> 
> * GDB stops all the threads.  All the stop events are pushed onto the
>   stop reply queue behind the pending fork event.

As this is a non-stop test, if GDB is stopping all the threads, then it
must be that your arch doesn't support displaced stepping (which is true).

By forcing displaced stepping off, I see the same internal error on x86.

I've pushed the testsuite change below to expose the problem.

Haven't managed to look at your fix in detail yet.

From 6b2e4f10aeb64868720de06d3b2da3cc2d908f10 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 1 Feb 2016 18:48:04 +0000
Subject: [PATCH] Test gdb.threads/forking-threads-plus-breakpoint.exp with
 displaced stepping off

This exposes the internal error Don mentioned in PR19496:

  (1) internal error --  gdb/target.c:2713: internal-error: Can't determine the current address space of thread

More analysis here:

  https://sourceware.org/ml/gdb-patches/2016-01/msg00685.html

The (now kfailed) internal error looks like:

 continue &
 Continuing.
 (gdb) PASS: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1: detach_on_fork=on: displaced=off: continue &
 [New Thread 2846.2847]
 (...)
 [New Thread 2867.2867]
 /home/pedro/gdb/mygit/src/gdb/target.c:2723: internal-error: Can't determine the current address space of thread Thread 2846.2846

 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) KFAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1: detach_on_fork=on: displaced=off: inferior 1 exited (GDB internal error) (PRMS: remote/19496)
 Resyncing due to internal error.

gdb/testsuite/ChangeLog:
2016-02-01  Pedro Alves  <palves@redhat.com>

	PR remote/19496
	* gdb.threads/forking-threads-plus-breakpoint.exp
	(displaced_stepping_supported): New global.
	(probe_displaced_stepping_support): New procedure.
	(do_test): Add 'displaced' parameter, and use it.
	(top level): Check for displaced stepping support.  Add displaced
	stepping on/off testing axis.
---
 gdb/testsuite/ChangeLog                            |  9 ++++
 .../forking-threads-plus-breakpoint.exp            | 63 ++++++++++++++++++++--
 2 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 6af0e48..fb64674 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2016-02-01  Pedro Alves  <palves@redhat.com>
+
+	* gdb.threads/forking-threads-plus-breakpoint.exp
+	(displaced_stepping_supported): New global.
+	(probe_displaced_stepping_support): New procedure.
+	(do_test): Add 'displaced' parameter, and use it.
+	(top level): Check for displaced stepping support.  Add displaced
+	stepping on/off testing axis.
+
 2016-02-01  Andrew Burgess  <andrew.burgess@embecosm.com>
 
 	* gdb.mi/mi-vla-fortran.exp: Add XFAIL for accessing unassociated
diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
index b5f7c21..6c72061 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
@@ -24,13 +24,49 @@ if {[build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] =
     return -1
 }
 
+# Assume yes.
+set displaced_stepping_supported 1
+
+# "set displaced on" only tells gdb to use displaced stepping if
+# possible.  Probe for actual support.
+
+proc probe_displaced_stepping_support {} {
+    global displaced_stepping_supported
+    global binfile gdb_prompt
+
+    with_test_prefix "probe displaced-stepping support" {
+	clean_restart $binfile
+
+	gdb_test_no_output "set displaced on"
+	if ![runto_main] then {
+	    fail "Can't run to main"
+	    return 0
+	}
+
+	# We're stopped at the main breakpoint.  If displaced stepping is
+	# supported, we'll see related debug output while we step past
+	# that breakpoint.
+	gdb_test_no_output "set debug displaced 1"
+	gdb_test_multiple "next" "probe" {
+	    -re "displaced pc to.*$gdb_prompt $" {
+		pass "supported"
+	    }
+	    -re ".*$gdb_prompt $" {
+		set displaced_stepping_supported 0
+		pass "not supported"
+	    }
+	}
+    }
+}
+
 # The test proper.  If COND_BP_TARGET is true, then test with
 # conditional breakpoints evaluated on the target side, if possible.
 # DETACH_ON_FORK is used as value for the "set detach-on-fork"
 # setting.  If "on", this exercises GDB explicitly continuing the fork
 # child until exit.  If "off", this exercises GDB detaching the fork
-# child.
-proc do_test { cond_bp_target detach_on_fork } {
+# child.  DISPLACED indicates whether to use displaced stepping or
+# not.
+proc do_test { cond_bp_target detach_on_fork displaced } {
     global GDBFLAGS
     global srcfile testfile binfile
     global decimal gdb_prompt
@@ -48,6 +84,7 @@ proc do_test { cond_bp_target detach_on_fork } {
     }
 
     gdb_test_no_output "set detach-on-fork $detach_on_fork"
+    gdb_test_no_output "set displaced $displaced"
 
     gdb_test "break $linenum if zero == 1" \
 	"Breakpoint .*" \
@@ -63,6 +100,12 @@ proc do_test { cond_bp_target detach_on_fork } {
     set fork_count 0
     set ok 0
 
+    if {$displaced == "off"
+	&& [target_info exists gdb_protocol]
+	&& ([target_info gdb_protocol] == "remote"
+	    || [target_info gdb_protocol] == "extended-remote")} {
+	setup_kfail "remote/19496" *-*-*
+    }
     set test "inferior 1 exited"
     gdb_test_multiple "" $test {
 	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
@@ -92,14 +135,24 @@ proc do_test { cond_bp_target detach_on_fork } {
 	"only inferior 1 left"
 }
 
+probe_displaced_stepping_support
+
 foreach_with_prefix cond_bp_target {1 0} {
     foreach_with_prefix detach_on_fork {"on" "off"} {
-	do_test $cond_bp_target $detach_on_fork
-
 	# Disable "off" for now.  The test does pass with
 	# detach-on-fork off (at the time of writing), but gdb seems
 	# to slow down quadratically as inferiors are created, and
 	# then the test takes annoyingly long to complete...
-	break
+	if {$detach_on_fork == "off"} {
+	    continue
+	}
+
+	foreach_with_prefix displaced {"on" "off"} {
+	    if {$displaced == "on" && !$displaced_stepping_supported} {
+		continue
+	    }
+
+	    do_test $cond_bp_target $detach_on_fork $displaced
+	}
     }
 }
-- 
1.9.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-01 12:05   ` Pedro Alves
@ 2016-02-01 19:29     ` Don Breazeal
  2016-02-01 20:09       ` Pedro Alves
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-02-01 19:29 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2/1/2016 4:05 AM, Pedro Alves wrote:
> On 01/28/2016 12:48 AM, Don Breazeal wrote:
> 
>>
>> ---
>>  gdb/infrun.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>> index 15210c9..e324864 100644
>> --- a/gdb/infrun.c
>> +++ b/gdb/infrun.c
>> @@ -5232,7 +5232,13 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>>  	  ecs->ptid = inferior_ptid;
>>  
>>  	  if (should_resume)
>> -	    keep_going (ecs);
>> +	    {
>> +	      /* Make sure the thread is marked as running.  If the event
>> +		 occurred in the thread before we added it, it may have
>> +		 never been marked running.  */
>> +	      set_running (inferior_ptid, 1);
>> +	      keep_going (ecs);
>> +	    }
>>  	  else
>>  	    stop_waiting (ecs);
>>  	  return;
>>
> 
> Can you check whether this is still needed in current master?
> 
> Specifically, whether a2077e254098 ([PATCH] Fix PR 19461: strange "info thread"
> behavior in non-stop) already fixed this?
> 
> See:
>  https://sourceware.org/ml/gdb-patches/2016-01/msg00446.html
>  https://sourceware.org/bugzilla/show_bug.cgi?id=19461
> 
> Thanks,
> Pedro Alves
> 
Hi Pedro,
It looks like my patch is still needed.  FYI, the values of the
variables used to determine whether to call set_running in your patch
are this, in my failing test case:

follow_child 0
detach_fork 1
non_stop 1
sched_multi 0
target_is_non_stop_p () 1

Thanks,
--Don

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-01-28  0:48 ` [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Don Breazeal
@ 2016-02-01 19:38   ` Pedro Alves
  0 siblings, 0 replies; 29+ messages in thread
From: Pedro Alves @ 2016-02-01 19:38 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 01/28/2016 12:48 AM, Don Breazeal wrote:
> This patch addresses "fork:Interrupted system call" (or wait:) failures
> in gdb.threads/forking-threads-plus-breakpoint.exp.
> 
> The test program spawns ten threads, each of which do ten fork/waitpid
> sequences.  The cause of the problem was that when one of the fork
> children exited before the corresponding fork parent could initiate its
> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
> or waitpid in another thread.
> 
> The fix was to wrap the system calls in a loop to retry the call if
> it was interrupted, like:
> 
> do
>   {
>     pid = fork ();
>   }
> while (pid == -1 && errno == EINTR);
> 
> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
> 
> Tested on Nios II Linux target with x86 Linux host.

I'd prefer to avoid this if possible.  These loops potentially hide
bugs like ERESTARTSYS escaping out of a syscall and mishandling of
signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
   https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html

How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-01 19:29     ` Don Breazeal
@ 2016-02-01 20:09       ` Pedro Alves
  2016-02-11  0:28         ` [PATCH v2 2/3] PR remote/19496, interrupted syscall " Don Breazeal
  2016-02-11  0:28         ` [PATCH v2 3/3] PR remote/19496, timeout " Don Breazeal
  0 siblings, 2 replies; 29+ messages in thread
From: Pedro Alves @ 2016-02-01 20:09 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 02/01/2016 07:29 PM, Don Breazeal wrote:
> On 2/1/2016 4:05 AM, Pedro Alves wrote:

> Hi Pedro,
> It looks like my patch is still needed.  FYI, the values of the
> variables used to determine whether to call set_running in your patch
> are this, in my failing test case:
> 
> follow_child 0
> detach_fork 1
> non_stop 1
> sched_multi 0
> target_is_non_stop_p () 1

Hmm.

> A fork event was reported to GDB before GDB knew about the parent thread,
> followed immediately by a breakpoint event in a different thread.  The
> parent thread was subsequently added via remote_notice_new_inferior in
> process_stop_reply, but when the thread was added the thread_info.state
> was set to THREAD_STOPPED.  The fork event was then handled correctly,
> but when the fork parent was resumed via a call to keep_going, the state
> was unchanged.

Since this is non-stop, then it sounds to me like the bug is that the
thread should have been added in THREAD_RUNNING state.

Consider that infrun may be pulling target events out of the target_ops
backend into its own event queue, but, not process them immediately.

E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
operation for thread A (stop_all_threads).  The waitstatus of all threads
is thus left pending in the thread structure (save_status), including the
fork event of thread B.  Right at this point, if the user
does "info threads", that should show thread B (the fork parent) running,
not stopped, even if internally, gdb is holding it paused for a little bit.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt
  2016-01-28  0:48 ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Don Breazeal
  2016-02-01 19:26   ` [pushed] Test gdb.threads/forking-threads-plus-breakpoint.exp with, displaced stepping off (Re: [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt) Pedro Alves
@ 2016-02-01 20:38   ` Pedro Alves
  2016-02-11  0:26     ` [PATCH v2 " Don Breazeal
  1 sibling, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-02-01 20:38 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 01/28/2016 12:48 AM, Don Breazeal wrote:
> This patch fixes an internal error that occurs in
> gdb.threads/forking-threads-plus-breakpoint.exp:
> 
> /blah/binutils-gdb/gdb/target.c:2723: internal-error: Can't determine the
> current address space of thread Thread 3170.3170
> 
> In default_thread_address_space, find_inferior_ptid couldn't find 3170.3170
> because it had been overwritten in inferior_appeared, called as follows:
> 
> inferior_appeared
>   remote_add_inferior
>     remote_notice_new_inferior
>       remote_update_thread_list
> 
> The cause of the problem was the following sequence of events:
> 
> * GDB knows only about the main thread
> 
> * the first fork event is reported to GDB, saved as pending_event
> 
> * qXfer:threads_read gets the threads from the remote.
>   remove_new_fork_children id's the fork child from the pending event
>   and removes it from the list reported to GDB.  All the rest of the
>   threads, including the fork parent, are added to the GDB thread list.
> 
> * GDB stops all the threads.  All the stop events are pushed onto the
>   stop reply queue behind the pending fork event.
> 
> * remote_wait_ns calls queued_stop_reply and process_stop_reply to
>   remove the fork event from the front of the stop reply queue and save
>   event information in the thread_info structure for the fork parent
>   thread.  Unfortunately, none of the information saved in this way is
>   the fork-specific information, so the actual fork event info is lost.
> 
> * A subsequent qXfer:threads:read packet gets the thread list including
>   the fork parent and fork child.  remove_new_fork_children checks the
>   thread list to see if there is a fork parent, doesn't find one, checks
>   the stop reply queue for a pending fork event, doesn't find one, and
>   allows the fork child thread to be reported to GDB before the fork
>   event has been handled.  remote_update_thread_list calls
>   remote_notice_new_thread and overwrites the current (main) thread in
>   inferior_appeared.  GDB has now lost all knowledge of the main thread,
>   and an internal error results.
> 
> The fix was to make sure that when the stop reply was removed from the
> stop reply queuei, all of the necessary fork event information was stored
> in the parent thread structure.  In process_stop_reply we call a new
> function, update_thread_if_fork_parent, to store the pending_follow
> information from the fork stop reply in the fork parent thread.
> 
> Tested on x86_64 and Nios II Linux.  No regressions, but more failures,
> which are addressed in subsequent patches in this patchset.
> 

Many thanks for the detective work!

> +static void
> +update_thread_if_fork_parent (struct stop_reply *stop_reply)
> +{
> +  ptid_t ptid;
> +
> +  ptid = stop_reply->ptid;
> +  if (stop_reply->ws.kind == TARGET_WAITKIND_FORKED
> +      || stop_reply->ws.kind == TARGET_WAITKIND_VFORKED)
> +    {
> +      struct thread_info *tp = find_thread_ptid (ptid);
> +
> +      tp->pending_follow = stop_reply->ws;
> +    }
> +}

So the fork event has been reported out of target_wait but it was
left pending on the infrun side (infrun.c:save_waitstatus)

IOW, the fork event hasn't been processed by handle_inferior_event
yet, so it hasn't made it to tp->pending_follow yet.

The information is not lost, we're just looking for it in the
wrong place.  I think this would be the right fix:

From 211e4553a500f7a81d11860f9707db97b0a53c45 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 1 Feb 2016 20:25:00 +0000
Subject: [PATCH] Fix PR remote/19496, internal error

---
 gdb/remote.c                                                  | 7 ++++++-
 gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp | 6 ------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 8831b50..5dffb98 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6180,7 +6180,12 @@ remove_new_fork_children (struct threads_listing_context *context)
      fork child threads from the CONTEXT list.  */
   ALL_NON_EXITED_THREADS (thread)
     {
-      struct target_waitstatus *ws = &thread->pending_follow;
+      struct target_waitstatus *ws;
+
+      if (thread->suspend.waitstatus_pending_p)
+	ws = &thread->suspend.waitstatus;
+      else
+	ws = &thread->pending_follow;

       if (is_pending_fork_parent (ws, pid, thread->ptid))
 	{
diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
index 6c72061..ff3ca9a 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
@@ -100,12 +100,6 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
     set fork_count 0
     set ok 0

-    if {$displaced == "off"
-	&& [target_info exists gdb_protocol]
-	&& ([target_info gdb_protocol] == "remote"
-	    || [target_info gdb_protocol] == "extended-remote")} {
-	setup_kfail "remote/19496" *-*-*
-    }
     set test "inferior 1 exited"
     gdb_test_multiple "" $test {
 	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
-- 
1.9.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/3] PR remote/19496, internal err forking-threads-plus-bkpt
  2016-02-01 20:38   ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Pedro Alves
@ 2016-02-11  0:26     ` Don Breazeal
  2016-02-12 20:15       ` Pedro Alves
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-02-11  0:26 UTC (permalink / raw)
  To: gdb-patches, palves

Hi Pedro,
Here is an update to the patch per your suggestion:

On 2/1/2016 12:38 PM, Pedro Alves wrote:
> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>> This patch fixes an internal error that occurs in
>> gdb.threads/forking-threads-plus-breakpoint.exp:
>>
>> /blah/binutils-gdb/gdb/target.c:2723: internal-error: Can't determine the
>> current address space of thread Thread 3170.3170
>>
>> In default_thread_address_space, find_inferior_ptid couldn't find 3170.3170
>> because it had been overwritten in inferior_appeared,
---snip---
>> +static void
>> +update_thread_if_fork_parent (struct stop_reply *stop_reply)
>> +{
>> +  ptid_t ptid;
>> +
>> +  ptid = stop_reply->ptid;
>> +  if (stop_reply->ws.kind == TARGET_WAITKIND_FORKED
>> +      || stop_reply->ws.kind == TARGET_WAITKIND_VFORKED)
>> +    {
>> +      struct thread_info *tp = find_thread_ptid (ptid);
>> +
>> +      tp->pending_follow = stop_reply->ws;
>> +    }
>> +}
> 
> So the fork event has been reported out of target_wait but it was
> left pending on the infrun side (infrun.c:save_waitstatus)
> 
> IOW, the fork event hasn't been processed by handle_inferior_event
> yet, so it hasn't made it to tp->pending_follow yet.
> 
> The information is not lost, we're just looking for it in the
> wrong place.  I think this would be the right fix:

The fix below is essentially unchanged from your suggested fix.  The commit
message has been updated for the new fix.

Tested on Nios II Linux target with x86 Linux host, and native x86_64 Linux.

Thanks!
--Don

-----
This patch fixes an internal error that occurs in
gdb.threads/forking-threads-plus-breakpoint.exp:

/blah/binutils-gdb/gdb/target.c:2723: internal-error: Can't determine the
current address space of thread Thread 3170.3170

In default_thread_address_space, find_inferior_ptid couldn't find 3170.3170
because it had been overwritten in inferior_appeared, called as follows:

inferior_appeared
  remote_add_inferior
    remote_notice_new_inferior
      remote_update_thread_list

The cause of the problem was the following sequence of events:

* GDB knows only about the main thread

* the first fork event is reported to GDB, saved as pending_event

* qXfer:threads:read gets the threads from the remote.
  remove_new_fork_children id's the fork child from the pending event
  and removes it from the list reported to GDB.  All the rest of the
  threads, including the fork parent, are added to the GDB thread list.

* GDB stops all the threads.  All the stop events are pushed onto the
  stop reply queue behind the pending fork event.  The fork waitstatus
  is saved in the fork parent thread's pending status field
  thread_info.suspend.

* remote_wait_ns calls queued_stop_reply and process_stop_reply to
  remove the fork event from the front of the stop reply queue and save
  event information in the thread_info structure for the fork parent
  thread.  Unfortunately, none of the information saved in this way is
  the fork-specific information.

* A subsequent qXfer:threads:read packet gets the thread list including
  the fork parent and fork child.  remove_new_fork_children checks the
  thread list to see if there is a fork parent, doesn't find one, checks
  the stop reply queue for a pending fork event, doesn't find one, and
  allows the fork child thread to be reported to GDB before the fork
  event has been handled.  remote_update_thread_list calls
  remote_notice_new_thread and overwrites the current (main) thread in
  inferior_appeared.

So the fork event has been reported out of target_wait but it was left
pending on the infrun side (infrun.c:save_waitstatus).  IOW, the fork
event hasn't been processed by handle_inferior_event yet, so it hasn't
made it to tp->pending_follow yet.

The fix is to check thread_info.suspend along with the
thread_info.pending_follow in remote.c:remove_new_fork_children, to
prevent premature reporting of the fork child thread creation.

gdb/ChangeLog:
2016-02-10  Don Breazeal  <donb@codesourcery.com>

	PR remote/19496
	* remote.c (remove_new_fork_children): Check for pending
	fork status in thread_info.suspend.

gdb/testsuite/ChangeLog:
2016-02-10  Don Breazeal  <donb@codesourcery.com>

	PR remote/19496
	* gdb.threads/forking-threads-plus-breakpoint.exp (do_test):
	Remove kfail for PR remote/19496.

---
 gdb/remote.c                                                  | 7 ++++++-
 gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp | 6 ------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 6d56f19..f09a06e 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6166,7 +6166,12 @@ remove_new_fork_children (struct threads_listing_context *context)
      fork child threads from the CONTEXT list.  */
   ALL_NON_EXITED_THREADS (thread)
     {
-      struct target_waitstatus *ws = &thread->pending_follow;
+      struct target_waitstatus *ws;
+
+      if (thread->suspend.waitstatus_pending_p)
+	ws = &thread->suspend.waitstatus;
+      else
+	ws = &thread->pending_follow;

       if (is_pending_fork_parent (ws, pid, thread->ptid))
 	{
diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
index 6c72061..ff3ca9a 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
@@ -100,12 +100,6 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
     set fork_count 0
     set ok 0

-    if {$displaced == "off"
-	&& [target_info exists gdb_protocol]
-	&& ([target_info gdb_protocol] == "remote"
-	    || [target_info gdb_protocol] == "extended-remote")} {
-	setup_kfail "remote/19496" *-*-*
-    }
     set test "inferior 1 exited"
     gdb_test_multiple "" $test {
 	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
-- 
1.8.1.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-01 20:09       ` Pedro Alves
  2016-02-11  0:28         ` [PATCH v2 2/3] PR remote/19496, interrupted syscall " Don Breazeal
@ 2016-02-11  0:28         ` Don Breazeal
  2016-02-25 17:29           ` [PING]Re: " Don Breazeal
  2016-03-15 15:30           ` Pedro Alves
  1 sibling, 2 replies; 29+ messages in thread
From: Don Breazeal @ 2016-02-11  0:28 UTC (permalink / raw)
  To: gdb-patches, palves

On 2/1/2016 12:09 PM, Pedro Alves wrote:
> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
> 
>> Hi Pedro,
---snip---
>> A fork event was reported to GDB before GDB knew about the parent thread,
>> followed immediately by a breakpoint event in a different thread.  The
>> parent thread was subsequently added via remote_notice_new_inferior in
>> process_stop_reply, but when the thread was added the thread_info.state
>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>> but when the fork parent was resumed via a call to keep_going, the state
>> was unchanged.
> 
> Since this is non-stop, then it sounds to me like the bug is that the
> thread should have been added in THREAD_RUNNING state.
> 
> Consider that infrun may be pulling target events out of the target_ops
> backend into its own event queue, but, not process them immediately.
> 
> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
> operation for thread A (stop_all_threads).  The waitstatus of all threads
> is thus left pending in the thread structure (save_status), including the
> fork event of thread B.  Right at this point, if the user
> does "info threads", that should show thread B (the fork parent) running,
> not stopped, even if internally, gdb is holding it paused for a little bit.
> 

Hi Pedro,
Here is a new patch that adds the threads with the state set to
THREAD_RUNNING for fork events.
Thanks!
--Don

This patch addresses a failure in
gdb.threads/forking-threads-plus-breakpoint.exp:

FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
detach_on_fork=on: inferior 1 exited (timeout)

Cause:
A fork event was reported to GDB before GDB knew about the parent thread,
followed immediately by a breakpoint event in a different thread.  The
parent thread was subsequently added via remote_notice_new_inferior in
process_stop_reply, but when the thread was added the thread_info.state
was set to THREAD_STOPPED.  The fork event was then handled correctly,
but when the fork parent was resumed via a call to keep_going, the state
was unchanged.

The breakpoint event was then handled, which caused all the non-breakpoint
threads to be stopped.  When the breakpoint thread was resumed, all the
non-breakpoint threads were resumed via infrun.c:restart_threads.  Our old
fork parent wasn't restarted, because it still had thread_info.state set to
THREAD_STOPPED.  Ultimately the program under debug hung waiting for a
pthread_join while the old fork parent was stopped forever by GDB.

Fix:
Make sure to add the fork parent thread in the THREAD_RUNNING state by
calling remote_notice_new_inferior with RUNNING set to 1 when processing
a fork stop reply.

Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.

gdb/ChangeLog:
2016-02-10  Don Breazeal  <donb@codesourcery.com>

	* remote.c (process_stop_reply): Call remote_notice_new_inferior
	with RUNNING set to 1 when handling fork events.

---
 gdb/remote.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index f09a06e..ab750a7 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6818,7 +6818,14 @@ process_stop_reply (struct stop_reply *stop_reply,
 	  VEC_free (cached_reg_t, stop_reply->regcache);
 	}
 
-      remote_notice_new_inferior (ptid, 0);
+      /* If a fork event arrived before we knew about the parent thread,
+	 make sure to mark it as running when it is created.  */
+      if (status->kind == TARGET_WAITKIND_FORKED
+	  || status->kind == TARGET_WAITKIND_VFORKED)
+	remote_notice_new_inferior (ptid, 1);
+      else
+	remote_notice_new_inferior (ptid, 0);
+
       remote_thr = demand_private_info (ptid);
       remote_thr->core = stop_reply->core;
       remote_thr->stop_reason = stop_reply->stop_reason;
-- 
1.8.1.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-02-01 20:09       ` Pedro Alves
@ 2016-02-11  0:28         ` Don Breazeal
  2016-02-25 17:28           ` Don Breazeal
  2016-03-15 12:55           ` Pedro Alves
  2016-02-11  0:28         ` [PATCH v2 3/3] PR remote/19496, timeout " Don Breazeal
  1 sibling, 2 replies; 29+ messages in thread
From: Don Breazeal @ 2016-02-11  0:28 UTC (permalink / raw)
  To: gdb-patches, palves

Hi Pedro,

On 2/1/2016 11:38 AM, Pedro Alves wrote:
> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>
>> The test program spawns ten threads, each of which do ten fork/waitpid
>> sequences.  The cause of the problem was that when one of the fork
>> children exited before the corresponding fork parent could initiate its
>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>> or waitpid in another thread.

In fact, I think my diagnosis here was incorrect, or at least incorrect
in some cases.  I believe at least some of the interruptions are caused
by SIGSTOP, sent by GDB when stopping all the threads.  More below.

>>
>> The fix was to wrap the system calls in a loop to retry the call if
>> it was interrupted, like:
>>
>> do
>>   {
>>     pid = fork ();
>>   }
>> while (pid == -1 && errno == EINTR);
>>
>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>
>> Tested on Nios II Linux target with x86 Linux host.
> 
> I'd prefer to avoid this if possible.  These loops potentially hide
> bugs like ERESTARTSYS escaping out of a syscall and mishandling of
> signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
>    https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html
> 
> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?

I spent a couple of days trying to find an alternate solution, but
couldn't find one that was reliable.  Here is a snapshot of what I tried:

1) SIG_IGN: results in an ECHILD from waitpid.  The man page for waitpid
   says "This can happen for one's own child if the action for SIGCHLD is
   set to SIG_IGN."

2) SA_RESTART: While waitpid is listed as a system call that can be
   restarted by SA_RESTART, fork is not.  Even if I leave the "EINTR loop"
   in place for fork, using SA_RESTART I still see an interrupted system
   call for waitpid.  Possibly because the problem is SIGSTOP and not
   SIGCHLD.

3) pthread_sigblock: With this set for SIGCHLD in all the threads, I
   still saw an interrupted system call.  You can't block SIGSTOP.

4) pthread_sigblock with sigwait: using pthread_sigblock on all the
   blockable signals with a signal thread that called sigwait for all
   the signals in a loop, the signal thread would see a bunch of SIGCHLDs,
   but there would eventually be an interrupted system call.

5) bsd_signal: this function is supposed to automatically restart blocking
   system calls.  fork is not a blocking system call, but it doesn't help
   for waitpid either.

I found this in the ptrace(2) man page: "Note that a suppressed signal
still causes system calls to return prematurely.  In this case, system
calls will be restarted: the tracer will observe the tracee to reexecute
the interrupted system call (or restart_syscall(2) system call for a few
system calls which use a different mechanism for restarting) if the tracer
uses PTRACE_SYSCALL.  Even system calls (such as poll(2)) which are not
restartable after signal are restarted after signal is suppressed; however,
kernel bugs exist which cause some system calls to fail with EINTR even
though no observable signal is injected to the tracee."

The GDB manual mentions something similar about interrupted system calls.

So, the bottom line is that I haven't changed the fix for the interrupted
system calls, because I can't find anything that works as well as the
original fix.  Perhaps this test puts enough stress on the kernel that the
kernel bugs mentioned above are exposed.

One change I did make from the previous version was to increase the
timeout to 90 seconds, which was necessary to get more reliable results
on the Nios II target.

Let me know what you think.
Thanks!
--Don

---
This patch addresses "fork:Interrupted system call" (or wait:) failures
in gdb.threads/forking-threads-plus-breakpoint.exp.

The test program spawns ten threads, each of which do ten fork/waitpid
sequences.  The cause of the problem was that when one of the fork
children exited before the corresponding fork parent could initiate its
waitpid for that child, a SIGCHLD was delivered and interrupted a fork
or waitpid in another thread.

The fix was to wrap the system calls in a loop to retry the call if
it was interrupted, like:

do
  {
    pid = fork ();
  }
while (pid == -1 && errno == EINTR);

Since this is a Linux-only test I figure it is OK to use errno and EINTR.
I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
pthread_sigblock, and bsd_signal, but none of these worked as well.

Tested on Nios II Linux target with x86 Linux host.

gdb/testsuite/ChangeLog:
2016-02-10  Don Breazeal  <donb@codesourcery.com>

	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
	Retry fork and waitpid on interrupted system call errors.
	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
	Increase timeout to 90.

---
 .../gdb.threads/forking-threads-plus-breakpoint.c          | 14 ++++++++++++--
 .../gdb.threads/forking-threads-plus-breakpoint.exp        |  3 +++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
index fc64d93..c169e18 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
@@ -22,6 +22,7 @@
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <stdlib.h>
+#include <errno.h>

 /* Number of threads.  Each thread continuously spawns a fork and wait
    for it.  If we have another thread continuously start a step over,
@@ -49,14 +50,23 @@ thread_forks (void *arg)
     {
       pid_t pid;

-      pid = fork ();
+      do
+	{
+	  pid = fork ();
+	}
+      while (pid == -1 && errno == EINTR);

       if (pid > 0)
 	{
 	  int status;

 	  /* Parent.  */
-	  pid = waitpid (pid, &status, 0);
+	  do
+	    {
+	      pid = waitpid (pid, &status, 0);
+	    }
+	  while (pid == -1 && errno == EINTR);
+
 	  if (pid == -1)
 	    {
 	      perror ("wait");
diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
index ff3ca9a..6889c2b 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
@@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
     global linenum
     global is_remote_target

+    global timeout
+    set timeout 90
+
     set saved_gdbflags $GDBFLAGS
     set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
     clean_restart $binfile
-- 
1.8.1.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] PR remote/19496, internal err forking-threads-plus-bkpt
  2016-02-11  0:26     ` [PATCH v2 " Don Breazeal
@ 2016-02-12 20:15       ` Pedro Alves
  2016-02-16 17:21         ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-02-12 20:15 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

Hi Don,

On 02/11/2016 12:26 AM, Don Breazeal wrote:

> On 2/1/2016 12:38 PM, Pedro Alves wrote:

>> So the fork event has been reported out of target_wait but it was
>> left pending on the infrun side (infrun.c:save_waitstatus)
>>
>> IOW, the fork event hasn't been processed by handle_inferior_event
>> yet, so it hasn't made it to tp->pending_follow yet.
>>
>> The information is not lost, we're just looking for it in the
>> wrong place.  I think this would be the right fix:
> 
> The fix below is essentially unchanged from your suggested fix.  The commit
> message has been updated for the new fix.
> 
> Tested on Nios II Linux target with x86 Linux host, and native x86_64 Linux.

OK.  Fine for 7.11 as well.

I'll need a bit more to think over patches 2 and 3.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] PR remote/19496, internal err forking-threads-plus-bkpt
  2016-02-12 20:15       ` Pedro Alves
@ 2016-02-16 17:21         ` Don Breazeal
  0 siblings, 0 replies; 29+ messages in thread
From: Don Breazeal @ 2016-02-16 17:21 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2/12/2016 12:15 PM, Pedro Alves wrote:
> Hi Don,
> 
> On 02/11/2016 12:26 AM, Don Breazeal wrote:
> 
>> On 2/1/2016 12:38 PM, Pedro Alves wrote:
> 
>>> So the fork event has been reported out of target_wait but it was
>>> left pending on the infrun side (infrun.c:save_waitstatus)
>>>
>>> IOW, the fork event hasn't been processed by handle_inferior_event
>>> yet, so it hasn't made it to tp->pending_follow yet.
>>>
>>> The information is not lost, we're just looking for it in the
>>> wrong place.  I think this would be the right fix:
>>
>> The fix below is essentially unchanged from your suggested fix.  The commit
>> message has been updated for the new fix.
>>
>> Tested on Nios II Linux target with x86 Linux host, and native x86_64 Linux.
> 
> OK.  Fine for 7.11 as well.
> 
> I'll need a bit more to think over patches 2 and 3.
> 
> Thanks,
> Pedro Alves
> 
Thanks Pedro.  Patch 1 is now pushed to mainline and 7.11.
--Don

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-02-11  0:28         ` [PATCH v2 2/3] PR remote/19496, interrupted syscall " Don Breazeal
@ 2016-02-25 17:28           ` Don Breazeal
  2016-03-03 18:19             ` [PING] " Don Breazeal
  2016-03-15 12:55           ` Pedro Alves
  1 sibling, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-02-25 17:28 UTC (permalink / raw)
  To: gdb-patches, palves

Ping
Thanks,
--Don

On 2/10/2016 4:28 PM, Don Breazeal wrote:
> Hi Pedro,
> 
> On 2/1/2016 11:38 AM, Pedro Alves wrote:
>> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>>
>>> The test program spawns ten threads, each of which do ten fork/waitpid
>>> sequences.  The cause of the problem was that when one of the fork
>>> children exited before the corresponding fork parent could initiate its
>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>>> or waitpid in another thread.
> 
> In fact, I think my diagnosis here was incorrect, or at least incorrect
> in some cases.  I believe at least some of the interruptions are caused
> by SIGSTOP, sent by GDB when stopping all the threads.  More below.
> 
>>>
>>> The fix was to wrap the system calls in a loop to retry the call if
>>> it was interrupted, like:
>>>
>>> do
>>>   {
>>>     pid = fork ();
>>>   }
>>> while (pid == -1 && errno == EINTR);
>>>
>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>>
>>> Tested on Nios II Linux target with x86 Linux host.
>>
>> I'd prefer to avoid this if possible.  These loops potentially hide
>> bugs like ERESTARTSYS escaping out of a syscall and mishandling of
>> signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
>>    https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html
>>
>> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?
> 
> I spent a couple of days trying to find an alternate solution, but
> couldn't find one that was reliable.  Here is a snapshot of what I tried:
> 
> 1) SIG_IGN: results in an ECHILD from waitpid.  The man page for waitpid
>    says "This can happen for one's own child if the action for SIGCHLD is
>    set to SIG_IGN."
> 
> 2) SA_RESTART: While waitpid is listed as a system call that can be
>    restarted by SA_RESTART, fork is not.  Even if I leave the "EINTR loop"
>    in place for fork, using SA_RESTART I still see an interrupted system
>    call for waitpid.  Possibly because the problem is SIGSTOP and not
>    SIGCHLD.
> 
> 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I
>    still saw an interrupted system call.  You can't block SIGSTOP.
> 
> 4) pthread_sigblock with sigwait: using pthread_sigblock on all the
>    blockable signals with a signal thread that called sigwait for all
>    the signals in a loop, the signal thread would see a bunch of SIGCHLDs,
>    but there would eventually be an interrupted system call.
> 
> 5) bsd_signal: this function is supposed to automatically restart blocking
>    system calls.  fork is not a blocking system call, but it doesn't help
>    for waitpid either.
> 
> I found this in the ptrace(2) man page: "Note that a suppressed signal
> still causes system calls to return prematurely.  In this case, system
> calls will be restarted: the tracer will observe the tracee to reexecute
> the interrupted system call (or restart_syscall(2) system call for a few
> system calls which use a different mechanism for restarting) if the tracer
> uses PTRACE_SYSCALL.  Even system calls (such as poll(2)) which are not
> restartable after signal are restarted after signal is suppressed; however,
> kernel bugs exist which cause some system calls to fail with EINTR even
> though no observable signal is injected to the tracee."
> 
> The GDB manual mentions something similar about interrupted system calls.
> 
> So, the bottom line is that I haven't changed the fix for the interrupted
> system calls, because I can't find anything that works as well as the
> original fix.  Perhaps this test puts enough stress on the kernel that the
> kernel bugs mentioned above are exposed.
> 
> One change I did make from the previous version was to increase the
> timeout to 90 seconds, which was necessary to get more reliable results
> on the Nios II target.
> 
> Let me know what you think.
> Thanks!
> --Don
> 
> ---
> This patch addresses "fork:Interrupted system call" (or wait:) failures
> in gdb.threads/forking-threads-plus-breakpoint.exp.
> 
> The test program spawns ten threads, each of which do ten fork/waitpid
> sequences.  The cause of the problem was that when one of the fork
> children exited before the corresponding fork parent could initiate its
> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
> or waitpid in another thread.
> 
> The fix was to wrap the system calls in a loop to retry the call if
> it was interrupted, like:
> 
> do
>   {
>     pid = fork ();
>   }
> while (pid == -1 && errno == EINTR);
> 
> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
> I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
> pthread_sigblock, and bsd_signal, but none of these worked as well.
> 
> Tested on Nios II Linux target with x86 Linux host.
> 
> gdb/testsuite/ChangeLog:
> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
> 
> 	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
> 	Retry fork and waitpid on interrupted system call errors.
> 	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
> 	Increase timeout to 90.
> 
> ---
>  .../gdb.threads/forking-threads-plus-breakpoint.c          | 14 ++++++++++++--
>  .../gdb.threads/forking-threads-plus-breakpoint.exp        |  3 +++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
> index fc64d93..c169e18 100644
> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
> @@ -22,6 +22,7 @@
>  #include <sys/types.h>
>  #include <sys/wait.h>
>  #include <stdlib.h>
> +#include <errno.h>
>  
>  /* Number of threads.  Each thread continuously spawns a fork and wait
>     for it.  If we have another thread continuously start a step over,
> @@ -49,14 +50,23 @@ thread_forks (void *arg)
>      {
>        pid_t pid;
>  
> -      pid = fork ();
> +      do
> +	{
> +	  pid = fork ();
> +	}
> +      while (pid == -1 && errno == EINTR);
>  
>        if (pid > 0)
>  	{
>  	  int status;
>  
>  	  /* Parent.  */
> -	  pid = waitpid (pid, &status, 0);
> +	  do
> +	    {
> +	      pid = waitpid (pid, &status, 0);
> +	    }
> +	  while (pid == -1 && errno == EINTR);
> +
>  	  if (pid == -1)
>  	    {
>  	      perror ("wait");
> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> index ff3ca9a..6889c2b 100644
> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>      global linenum
>      global is_remote_target
>  
> +    global timeout
> +    set timeout 90
> +
>      set saved_gdbflags $GDBFLAGS
>      set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>      clean_restart $binfile
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PING]Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-11  0:28         ` [PATCH v2 3/3] PR remote/19496, timeout " Don Breazeal
@ 2016-02-25 17:29           ` Don Breazeal
  2016-03-03 18:20             ` [PING] " Don Breazeal
  2016-03-15 15:30           ` Pedro Alves
  1 sibling, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-02-25 17:29 UTC (permalink / raw)
  To: gdb-patches, palves

Ping
Thanks,
--Don

On 2/10/2016 4:28 PM, Don Breazeal wrote:
> On 2/1/2016 12:09 PM, Pedro Alves wrote:
>> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
>>
>>> Hi Pedro,
> ---snip---
>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>> followed immediately by a breakpoint event in a different thread.  The
>>> parent thread was subsequently added via remote_notice_new_inferior in
>>> process_stop_reply, but when the thread was added the thread_info.state
>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>> but when the fork parent was resumed via a call to keep_going, the state
>>> was unchanged.
>>
>> Since this is non-stop, then it sounds to me like the bug is that the
>> thread should have been added in THREAD_RUNNING state.
>>
>> Consider that infrun may be pulling target events out of the target_ops
>> backend into its own event queue, but, not process them immediately.
>>
>> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
>> operation for thread A (stop_all_threads).  The waitstatus of all threads
>> is thus left pending in the thread structure (save_status), including the
>> fork event of thread B.  Right at this point, if the user
>> does "info threads", that should show thread B (the fork parent) running,
>> not stopped, even if internally, gdb is holding it paused for a little bit.
>>
> 
> Hi Pedro,
> Here is a new patch that adds the threads with the state set to
> THREAD_RUNNING for fork events.
> Thanks!
> --Don
> 
> This patch addresses a failure in
> gdb.threads/forking-threads-plus-breakpoint.exp:
> 
> FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
> detach_on_fork=on: inferior 1 exited (timeout)
> 
> Cause:
> A fork event was reported to GDB before GDB knew about the parent thread,
> followed immediately by a breakpoint event in a different thread.  The
> parent thread was subsequently added via remote_notice_new_inferior in
> process_stop_reply, but when the thread was added the thread_info.state
> was set to THREAD_STOPPED.  The fork event was then handled correctly,
> but when the fork parent was resumed via a call to keep_going, the state
> was unchanged.
> 
> The breakpoint event was then handled, which caused all the non-breakpoint
> threads to be stopped.  When the breakpoint thread was resumed, all the
> non-breakpoint threads were resumed via infrun.c:restart_threads.  Our old
> fork parent wasn't restarted, because it still had thread_info.state set to
> THREAD_STOPPED.  Ultimately the program under debug hung waiting for a
> pthread_join while the old fork parent was stopped forever by GDB.
> 
> Fix:
> Make sure to add the fork parent thread in the THREAD_RUNNING state by
> calling remote_notice_new_inferior with RUNNING set to 1 when processing
> a fork stop reply.
> 
> Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.
> 
> gdb/ChangeLog:
> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
> 
> 	* remote.c (process_stop_reply): Call remote_notice_new_inferior
> 	with RUNNING set to 1 when handling fork events.
> 
> ---
>  gdb/remote.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index f09a06e..ab750a7 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -6818,7 +6818,14 @@ process_stop_reply (struct stop_reply *stop_reply,
>  	  VEC_free (cached_reg_t, stop_reply->regcache);
>  	}
>  
> -      remote_notice_new_inferior (ptid, 0);
> +      /* If a fork event arrived before we knew about the parent thread,
> +	 make sure to mark it as running when it is created.  */
> +      if (status->kind == TARGET_WAITKIND_FORKED
> +	  || status->kind == TARGET_WAITKIND_VFORKED)
> +	remote_notice_new_inferior (ptid, 1);
> +      else
> +	remote_notice_new_inferior (ptid, 0);
> +
>        remote_thr = demand_private_info (ptid);
>        remote_thr->core = stop_reply->core;
>        remote_thr->stop_reason = stop_reply->stop_reason;
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-02-25 17:28           ` Don Breazeal
@ 2016-03-03 18:19             ` Don Breazeal
  2016-03-14 21:29               ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-03-03 18:19 UTC (permalink / raw)
  To: gdb-patches, palves

Ping.
I checked, the patch still applies cleanly to mainline.
Thanks
--Don

On 2/25/2016 9:28 AM, Don Breazeal wrote:
> Ping
> Thanks,
> --Don
> 
> On 2/10/2016 4:28 PM, Don Breazeal wrote:
>> Hi Pedro,
>>
>> On 2/1/2016 11:38 AM, Pedro Alves wrote:
>>> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>>>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>>>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>>>
>>>> The test program spawns ten threads, each of which do ten fork/waitpid
>>>> sequences.  The cause of the problem was that when one of the fork
>>>> children exited before the corresponding fork parent could initiate its
>>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>>>> or waitpid in another thread.
>>
>> In fact, I think my diagnosis here was incorrect, or at least incorrect
>> in some cases.  I believe at least some of the interruptions are caused
>> by SIGSTOP, sent by GDB when stopping all the threads.  More below.
>>
>>>>
>>>> The fix was to wrap the system calls in a loop to retry the call if
>>>> it was interrupted, like:
>>>>
>>>> do
>>>>   {
>>>>     pid = fork ();
>>>>   }
>>>> while (pid == -1 && errno == EINTR);
>>>>
>>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>>>
>>>> Tested on Nios II Linux target with x86 Linux host.
>>>
>>> I'd prefer to avoid this if possible.  These loops potentially hide
>>> bugs like ERESTARTSYS escaping out of a syscall and mishandling of
>>> signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
>>>    https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html
>>>
>>> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?
>>
>> I spent a couple of days trying to find an alternate solution, but
>> couldn't find one that was reliable.  Here is a snapshot of what I tried:
>>
>> 1) SIG_IGN: results in an ECHILD from waitpid.  The man page for waitpid
>>    says "This can happen for one's own child if the action for SIGCHLD is
>>    set to SIG_IGN."
>>
>> 2) SA_RESTART: While waitpid is listed as a system call that can be
>>    restarted by SA_RESTART, fork is not.  Even if I leave the "EINTR loop"
>>    in place for fork, using SA_RESTART I still see an interrupted system
>>    call for waitpid.  Possibly because the problem is SIGSTOP and not
>>    SIGCHLD.
>>
>> 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I
>>    still saw an interrupted system call.  You can't block SIGSTOP.
>>
>> 4) pthread_sigblock with sigwait: using pthread_sigblock on all the
>>    blockable signals with a signal thread that called sigwait for all
>>    the signals in a loop, the signal thread would see a bunch of SIGCHLDs,
>>    but there would eventually be an interrupted system call.
>>
>> 5) bsd_signal: this function is supposed to automatically restart blocking
>>    system calls.  fork is not a blocking system call, but it doesn't help
>>    for waitpid either.
>>
>> I found this in the ptrace(2) man page: "Note that a suppressed signal
>> still causes system calls to return prematurely.  In this case, system
>> calls will be restarted: the tracer will observe the tracee to reexecute
>> the interrupted system call (or restart_syscall(2) system call for a few
>> system calls which use a different mechanism for restarting) if the tracer
>> uses PTRACE_SYSCALL.  Even system calls (such as poll(2)) which are not
>> restartable after signal are restarted after signal is suppressed; however,
>> kernel bugs exist which cause some system calls to fail with EINTR even
>> though no observable signal is injected to the tracee."
>>
>> The GDB manual mentions something similar about interrupted system calls.
>>
>> So, the bottom line is that I haven't changed the fix for the interrupted
>> system calls, because I can't find anything that works as well as the
>> original fix.  Perhaps this test puts enough stress on the kernel that the
>> kernel bugs mentioned above are exposed.
>>
>> One change I did make from the previous version was to increase the
>> timeout to 90 seconds, which was necessary to get more reliable results
>> on the Nios II target.
>>
>> Let me know what you think.
>> Thanks!
>> --Don
>>
>> ---
>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>
>> The test program spawns ten threads, each of which do ten fork/waitpid
>> sequences.  The cause of the problem was that when one of the fork
>> children exited before the corresponding fork parent could initiate its
>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>> or waitpid in another thread.
>>
>> The fix was to wrap the system calls in a loop to retry the call if
>> it was interrupted, like:
>>
>> do
>>   {
>>     pid = fork ();
>>   }
>> while (pid == -1 && errno == EINTR);
>>
>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>> I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
>> pthread_sigblock, and bsd_signal, but none of these worked as well.
>>
>> Tested on Nios II Linux target with x86 Linux host.
>>
>> gdb/testsuite/ChangeLog:
>> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
>>
>> 	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
>> 	Retry fork and waitpid on interrupted system call errors.
>> 	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
>> 	Increase timeout to 90.
>>
>> ---
>>  .../gdb.threads/forking-threads-plus-breakpoint.c          | 14 ++++++++++++--
>>  .../gdb.threads/forking-threads-plus-breakpoint.exp        |  3 +++
>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> index fc64d93..c169e18 100644
>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>> @@ -22,6 +22,7 @@
>>  #include <sys/types.h>
>>  #include <sys/wait.h>
>>  #include <stdlib.h>
>> +#include <errno.h>
>>  
>>  /* Number of threads.  Each thread continuously spawns a fork and wait
>>     for it.  If we have another thread continuously start a step over,
>> @@ -49,14 +50,23 @@ thread_forks (void *arg)
>>      {
>>        pid_t pid;
>>  
>> -      pid = fork ();
>> +      do
>> +	{
>> +	  pid = fork ();
>> +	}
>> +      while (pid == -1 && errno == EINTR);
>>  
>>        if (pid > 0)
>>  	{
>>  	  int status;
>>  
>>  	  /* Parent.  */
>> -	  pid = waitpid (pid, &status, 0);
>> +	  do
>> +	    {
>> +	      pid = waitpid (pid, &status, 0);
>> +	    }
>> +	  while (pid == -1 && errno == EINTR);
>> +
>>  	  if (pid == -1)
>>  	    {
>>  	      perror ("wait");
>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> index ff3ca9a..6889c2b 100644
>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>>      global linenum
>>      global is_remote_target
>>  
>> +    global timeout
>> +    set timeout 90
>> +
>>      set saved_gdbflags $GDBFLAGS
>>      set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>>      clean_restart $binfile
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PING] Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-25 17:29           ` [PING]Re: " Don Breazeal
@ 2016-03-03 18:20             ` Don Breazeal
  2016-03-14 21:30               ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-03-03 18:20 UTC (permalink / raw)
  To: gdb-patches, palves

Ping.
I checked, the patch still applies cleanly to mainline.
Thanks
--Don

On 2/25/2016 9:29 AM, Don Breazeal wrote:
> Ping
> Thanks,
> --Don
> 
> On 2/10/2016 4:28 PM, Don Breazeal wrote:
>> On 2/1/2016 12:09 PM, Pedro Alves wrote:
>>> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>>>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
>>>
>>>> Hi Pedro,
>> ---snip---
>>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>>> followed immediately by a breakpoint event in a different thread.  The
>>>> parent thread was subsequently added via remote_notice_new_inferior in
>>>> process_stop_reply, but when the thread was added the thread_info.state
>>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>>> but when the fork parent was resumed via a call to keep_going, the state
>>>> was unchanged.
>>>
>>> Since this is non-stop, then it sounds to me like the bug is that the
>>> thread should have been added in THREAD_RUNNING state.
>>>
>>> Consider that infrun may be pulling target events out of the target_ops
>>> backend into its own event queue, but, not process them immediately.
>>>
>>> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
>>> operation for thread A (stop_all_threads).  The waitstatus of all threads
>>> is thus left pending in the thread structure (save_status), including the
>>> fork event of thread B.  Right at this point, if the user
>>> does "info threads", that should show thread B (the fork parent) running,
>>> not stopped, even if internally, gdb is holding it paused for a little bit.
>>>
>>
>> Hi Pedro,
>> Here is a new patch that adds the threads with the state set to
>> THREAD_RUNNING for fork events.
>> Thanks!
>> --Don
>>
>> This patch addresses a failure in
>> gdb.threads/forking-threads-plus-breakpoint.exp:
>>
>> FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
>> detach_on_fork=on: inferior 1 exited (timeout)
>>
>> Cause:
>> A fork event was reported to GDB before GDB knew about the parent thread,
>> followed immediately by a breakpoint event in a different thread.  The
>> parent thread was subsequently added via remote_notice_new_inferior in
>> process_stop_reply, but when the thread was added the thread_info.state
>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>> but when the fork parent was resumed via a call to keep_going, the state
>> was unchanged.
>>
>> The breakpoint event was then handled, which caused all the non-breakpoint
>> threads to be stopped.  When the breakpoint thread was resumed, all the
>> non-breakpoint threads were resumed via infrun.c:restart_threads.  Our old
>> fork parent wasn't restarted, because it still had thread_info.state set to
>> THREAD_STOPPED.  Ultimately the program under debug hung waiting for a
>> pthread_join while the old fork parent was stopped forever by GDB.
>>
>> Fix:
>> Make sure to add the fork parent thread in the THREAD_RUNNING state by
>> calling remote_notice_new_inferior with RUNNING set to 1 when processing
>> a fork stop reply.
>>
>> Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.
>>
>> gdb/ChangeLog:
>> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
>>
>> 	* remote.c (process_stop_reply): Call remote_notice_new_inferior
>> 	with RUNNING set to 1 when handling fork events.
>>
>> ---
>>  gdb/remote.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/gdb/remote.c b/gdb/remote.c
>> index f09a06e..ab750a7 100644
>> --- a/gdb/remote.c
>> +++ b/gdb/remote.c
>> @@ -6818,7 +6818,14 @@ process_stop_reply (struct stop_reply *stop_reply,
>>  	  VEC_free (cached_reg_t, stop_reply->regcache);
>>  	}
>>  
>> -      remote_notice_new_inferior (ptid, 0);
>> +      /* If a fork event arrived before we knew about the parent thread,
>> +	 make sure to mark it as running when it is created.  */
>> +      if (status->kind == TARGET_WAITKIND_FORKED
>> +	  || status->kind == TARGET_WAITKIND_VFORKED)
>> +	remote_notice_new_inferior (ptid, 1);
>> +      else
>> +	remote_notice_new_inferior (ptid, 0);
>> +
>>        remote_thr = demand_private_info (ptid);
>>        remote_thr->core = stop_reply->core;
>>        remote_thr->stop_reason = stop_reply->stop_reason;
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-03-03 18:19             ` [PING] " Don Breazeal
@ 2016-03-14 21:29               ` Don Breazeal
  0 siblings, 0 replies; 29+ messages in thread
From: Don Breazeal @ 2016-03-14 21:29 UTC (permalink / raw)
  To: gdb-patches, palves

Hi Pedro,
Did you have any more suggestions about handling the interrupted system
call, or shall we go with the "loop until we don't get -1 and
errno===EINTR" approach?
Thanks,
--Don

On 3/3/2016 10:19 AM, Don Breazeal wrote:
> Ping.
> I checked, the patch still applies cleanly to mainline.
> Thanks
> --Don
> 
> On 2/25/2016 9:28 AM, Don Breazeal wrote:
>> Ping
>> Thanks,
>> --Don
>>
>> On 2/10/2016 4:28 PM, Don Breazeal wrote:
>>> Hi Pedro,
>>>
>>> On 2/1/2016 11:38 AM, Pedro Alves wrote:
>>>> On 01/28/2016 12:48 AM, Don Breazeal wrote:
>>>>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>>>>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>>>>
>>>>> The test program spawns ten threads, each of which do ten fork/waitpid
>>>>> sequences.  The cause of the problem was that when one of the fork
>>>>> children exited before the corresponding fork parent could initiate its
>>>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>>>>> or waitpid in another thread.
>>>
>>> In fact, I think my diagnosis here was incorrect, or at least incorrect
>>> in some cases.  I believe at least some of the interruptions are caused
>>> by SIGSTOP, sent by GDB when stopping all the threads.  More below.
>>>
>>>>>
>>>>> The fix was to wrap the system calls in a loop to retry the call if
>>>>> it was interrupted, like:
>>>>>
>>>>> do
>>>>>   {
>>>>>     pid = fork ();
>>>>>   }
>>>>> while (pid == -1 && errno == EINTR);
>>>>>
>>>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>>>>
>>>>> Tested on Nios II Linux target with x86 Linux host.
>>>>
>>>> I'd prefer to avoid this if possible.  These loops potentially hide
>>>> bugs like ERESTARTSYS escaping out of a syscall and mishandling of
>>>> signals.  See bc9540e842eb5639ca59cb133adef211d252843c for example:
>>>>    https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html
>>>>
>>>> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART?
>>>
>>> I spent a couple of days trying to find an alternate solution, but
>>> couldn't find one that was reliable.  Here is a snapshot of what I tried:
>>>
>>> 1) SIG_IGN: results in an ECHILD from waitpid.  The man page for waitpid
>>>    says "This can happen for one's own child if the action for SIGCHLD is
>>>    set to SIG_IGN."
>>>
>>> 2) SA_RESTART: While waitpid is listed as a system call that can be
>>>    restarted by SA_RESTART, fork is not.  Even if I leave the "EINTR loop"
>>>    in place for fork, using SA_RESTART I still see an interrupted system
>>>    call for waitpid.  Possibly because the problem is SIGSTOP and not
>>>    SIGCHLD.
>>>
>>> 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I
>>>    still saw an interrupted system call.  You can't block SIGSTOP.
>>>
>>> 4) pthread_sigblock with sigwait: using pthread_sigblock on all the
>>>    blockable signals with a signal thread that called sigwait for all
>>>    the signals in a loop, the signal thread would see a bunch of SIGCHLDs,
>>>    but there would eventually be an interrupted system call.
>>>
>>> 5) bsd_signal: this function is supposed to automatically restart blocking
>>>    system calls.  fork is not a blocking system call, but it doesn't help
>>>    for waitpid either.
>>>
>>> I found this in the ptrace(2) man page: "Note that a suppressed signal
>>> still causes system calls to return prematurely.  In this case, system
>>> calls will be restarted: the tracer will observe the tracee to reexecute
>>> the interrupted system call (or restart_syscall(2) system call for a few
>>> system calls which use a different mechanism for restarting) if the tracer
>>> uses PTRACE_SYSCALL.  Even system calls (such as poll(2)) which are not
>>> restartable after signal are restarted after signal is suppressed; however,
>>> kernel bugs exist which cause some system calls to fail with EINTR even
>>> though no observable signal is injected to the tracee."
>>>
>>> The GDB manual mentions something similar about interrupted system calls.
>>>
>>> So, the bottom line is that I haven't changed the fix for the interrupted
>>> system calls, because I can't find anything that works as well as the
>>> original fix.  Perhaps this test puts enough stress on the kernel that the
>>> kernel bugs mentioned above are exposed.
>>>
>>> One change I did make from the previous version was to increase the
>>> timeout to 90 seconds, which was necessary to get more reliable results
>>> on the Nios II target.
>>>
>>> Let me know what you think.
>>> Thanks!
>>> --Don
>>>
>>> ---
>>> This patch addresses "fork:Interrupted system call" (or wait:) failures
>>> in gdb.threads/forking-threads-plus-breakpoint.exp.
>>>
>>> The test program spawns ten threads, each of which do ten fork/waitpid
>>> sequences.  The cause of the problem was that when one of the fork
>>> children exited before the corresponding fork parent could initiate its
>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork
>>> or waitpid in another thread.
>>>
>>> The fix was to wrap the system calls in a loop to retry the call if
>>> it was interrupted, like:
>>>
>>> do
>>>   {
>>>     pid = fork ();
>>>   }
>>> while (pid == -1 && errno == EINTR);
>>>
>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR.
>>> I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
>>> pthread_sigblock, and bsd_signal, but none of these worked as well.
>>>
>>> Tested on Nios II Linux target with x86 Linux host.
>>>
>>> gdb/testsuite/ChangeLog:
>>> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
>>>
>>> 	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
>>> 	Retry fork and waitpid on interrupted system call errors.
>>> 	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
>>> 	Increase timeout to 90.
>>>
>>> ---
>>>  .../gdb.threads/forking-threads-plus-breakpoint.c          | 14 ++++++++++++--
>>>  .../gdb.threads/forking-threads-plus-breakpoint.exp        |  3 +++
>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>>> index fc64d93..c169e18 100644
>>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
>>> @@ -22,6 +22,7 @@
>>>  #include <sys/types.h>
>>>  #include <sys/wait.h>
>>>  #include <stdlib.h>
>>> +#include <errno.h>
>>>  
>>>  /* Number of threads.  Each thread continuously spawns a fork and wait
>>>     for it.  If we have another thread continuously start a step over,
>>> @@ -49,14 +50,23 @@ thread_forks (void *arg)
>>>      {
>>>        pid_t pid;
>>>  
>>> -      pid = fork ();
>>> +      do
>>> +	{
>>> +	  pid = fork ();
>>> +	}
>>> +      while (pid == -1 && errno == EINTR);
>>>  
>>>        if (pid > 0)
>>>  	{
>>>  	  int status;
>>>  
>>>  	  /* Parent.  */
>>> -	  pid = waitpid (pid, &status, 0);
>>> +	  do
>>> +	    {
>>> +	      pid = waitpid (pid, &status, 0);
>>> +	    }
>>> +	  while (pid == -1 && errno == EINTR);
>>> +
>>>  	  if (pid == -1)
>>>  	    {
>>>  	      perror ("wait");
>>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>>> index ff3ca9a..6889c2b 100644
>>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>>> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>>>      global linenum
>>>      global is_remote_target
>>>  
>>> +    global timeout
>>> +    set timeout 90
>>> +
>>>      set saved_gdbflags $GDBFLAGS
>>>      set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>>>      clean_restart $binfile
>>>
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PING] Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-03-03 18:20             ` [PING] " Don Breazeal
@ 2016-03-14 21:30               ` Don Breazeal
  0 siblings, 0 replies; 29+ messages in thread
From: Don Breazeal @ 2016-03-14 21:30 UTC (permalink / raw)
  To: gdb-patches, palves

Ping.
Thanks,
--Don

On 3/3/2016 10:20 AM, Don Breazeal wrote:
> Ping.
> I checked, the patch still applies cleanly to mainline.
> Thanks
> --Don
> 
> On 2/25/2016 9:29 AM, Don Breazeal wrote:
>> Ping
>> Thanks,
>> --Don
>>
>> On 2/10/2016 4:28 PM, Don Breazeal wrote:
>>> On 2/1/2016 12:09 PM, Pedro Alves wrote:
>>>> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>>>>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
>>>>
>>>>> Hi Pedro,
>>> ---snip---
>>>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>>>> followed immediately by a breakpoint event in a different thread.  The
>>>>> parent thread was subsequently added via remote_notice_new_inferior in
>>>>> process_stop_reply, but when the thread was added the thread_info.state
>>>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>>>> but when the fork parent was resumed via a call to keep_going, the state
>>>>> was unchanged.
>>>>
>>>> Since this is non-stop, then it sounds to me like the bug is that the
>>>> thread should have been added in THREAD_RUNNING state.
>>>>
>>>> Consider that infrun may be pulling target events out of the target_ops
>>>> backend into its own event queue, but, not process them immediately.
>>>>
>>>> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
>>>> operation for thread A (stop_all_threads).  The waitstatus of all threads
>>>> is thus left pending in the thread structure (save_status), including the
>>>> fork event of thread B.  Right at this point, if the user
>>>> does "info threads", that should show thread B (the fork parent) running,
>>>> not stopped, even if internally, gdb is holding it paused for a little bit.
>>>>
>>>
>>> Hi Pedro,
>>> Here is a new patch that adds the threads with the state set to
>>> THREAD_RUNNING for fork events.
>>> Thanks!
>>> --Don
>>>
>>> This patch addresses a failure in
>>> gdb.threads/forking-threads-plus-breakpoint.exp:
>>>
>>> FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
>>> detach_on_fork=on: inferior 1 exited (timeout)
>>>
>>> Cause:
>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>> followed immediately by a breakpoint event in a different thread.  The
>>> parent thread was subsequently added via remote_notice_new_inferior in
>>> process_stop_reply, but when the thread was added the thread_info.state
>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>> but when the fork parent was resumed via a call to keep_going, the state
>>> was unchanged.
>>>
>>> The breakpoint event was then handled, which caused all the non-breakpoint
>>> threads to be stopped.  When the breakpoint thread was resumed, all the
>>> non-breakpoint threads were resumed via infrun.c:restart_threads.  Our old
>>> fork parent wasn't restarted, because it still had thread_info.state set to
>>> THREAD_STOPPED.  Ultimately the program under debug hung waiting for a
>>> pthread_join while the old fork parent was stopped forever by GDB.
>>>
>>> Fix:
>>> Make sure to add the fork parent thread in the THREAD_RUNNING state by
>>> calling remote_notice_new_inferior with RUNNING set to 1 when processing
>>> a fork stop reply.
>>>
>>> Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.
>>>
>>> gdb/ChangeLog:
>>> 2016-02-10  Don Breazeal  <donb@codesourcery.com>
>>>
>>> 	* remote.c (process_stop_reply): Call remote_notice_new_inferior
>>> 	with RUNNING set to 1 when handling fork events.
>>>
>>> ---
>>>  gdb/remote.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gdb/remote.c b/gdb/remote.c
>>> index f09a06e..ab750a7 100644
>>> --- a/gdb/remote.c
>>> +++ b/gdb/remote.c
>>> @@ -6818,7 +6818,14 @@ process_stop_reply (struct stop_reply *stop_reply,
>>>  	  VEC_free (cached_reg_t, stop_reply->regcache);
>>>  	}
>>>  
>>> -      remote_notice_new_inferior (ptid, 0);
>>> +      /* If a fork event arrived before we knew about the parent thread,
>>> +	 make sure to mark it as running when it is created.  */
>>> +      if (status->kind == TARGET_WAITKIND_FORKED
>>> +	  || status->kind == TARGET_WAITKIND_VFORKED)
>>> +	remote_notice_new_inferior (ptid, 1);
>>> +      else
>>> +	remote_notice_new_inferior (ptid, 0);
>>> +
>>>        remote_thr = demand_private_info (ptid);
>>>        remote_thr->core = stop_reply->core;
>>>        remote_thr->stop_reason = stop_reply->stop_reason;
>>>
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-02-11  0:28         ` [PATCH v2 2/3] PR remote/19496, interrupted syscall " Don Breazeal
  2016-02-25 17:28           ` Don Breazeal
@ 2016-03-15 12:55           ` Pedro Alves
  2016-03-16 18:26             ` Don Breazeal
  1 sibling, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-03-15 12:55 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

Hi Don,

On 02/11/2016 12:28 AM, Don Breazeal wrote:

> So, the bottom line is that I haven't changed the fix for the interrupted
> system calls, because I can't find anything that works as well as the
> original fix.  Perhaps this test puts enough stress on the kernel that the
> kernel bugs mentioned above are exposed.

Many thanks for the thorough investigation.  Let's take your
original approach then.

> 
> One change I did make from the previous version was to increase the
> timeout to 90 seconds, which was necessary to get more reliable results
> on the Nios II target.
> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> index ff3ca9a..6889c2b 100644
> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>       global linenum
>       global is_remote_target
>   
> +    global timeout
> +    set timeout 90
> +

Use with_timeout_factor instead so that the timeout is properly restored,
and put it around the problematic test, only, instead of basically
around the whole test case.  I think that'll be the "inferior 1 exited"
test?

>       set saved_gdbflags $GDBFLAGS
>       set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>       clean_restart $binfile

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-02-11  0:28         ` [PATCH v2 3/3] PR remote/19496, timeout " Don Breazeal
  2016-02-25 17:29           ` [PING]Re: " Don Breazeal
@ 2016-03-15 15:30           ` Pedro Alves
  2016-03-16 17:29             ` Don Breazeal
  1 sibling, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-03-15 15:30 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

Hi Don,

Sorry for the delay.  I had to find a moment to think this through.

On 02/11/2016 12:28 AM, Don Breazeal wrote:
> On 2/1/2016 12:09 PM, Pedro Alves wrote:
>> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
>>
>>> Hi Pedro,
> ---snip---
>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>> followed immediately by a breakpoint event in a different thread.  The
>>> parent thread was subsequently added via remote_notice_new_inferior in
>>> process_stop_reply, but when the thread was added the thread_info.state
>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>> but when the fork parent was resumed via a call to keep_going, the state
>>> was unchanged.
>>
>> Since this is non-stop, then it sounds to me like the bug is that the
>> thread should have been added in THREAD_RUNNING state.
>>
>> Consider that infrun may be pulling target events out of the target_ops
>> backend into its own event queue, but, not process them immediately.
>>
>> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
>> operation for thread A (stop_all_threads).  The waitstatus of all threads
>> is thus left pending in the thread structure (save_status), including the
>> fork event of thread B.  Right at this point, if the user
>> does "info threads", that should show thread B (the fork parent) running,
>> not stopped, even if internally, gdb is holding it paused for a little bit.
>>
> 
> Hi Pedro,
> Here is a new patch that adds the threads with the state set to
> THREAD_RUNNING for fork events.

I really meant it for _all_ kinds of events, not just fork.

I don't think we can do this:

+      if (status->kind == TARGET_WAITKIND_FORKED
+	  || status->kind == TARGET_WAITKIND_VFORKED)
+	remote_notice_new_inferior (ptid, 1);

in all-stop mode, as passing 1 means we'll end up installing a
continuation that immediately resumes the whole process.

+      else
+       remote_notice_new_inferior (ptid, 0);

and with this, if in non-stop, and the event ends up pending
on the infrun side, "info threads" will be confused for that
thread too.

So the fix requires a bit of plumbing to pass down the correct
external and internal thread states.  Something like the patch
below.

WDYT?  Does it fix the timeout you observe?

(We should really come up with better, less confusing names for the
"running" vs "executing" distinction.  Maybe external/internal running
state, or user/internal, or public/private...)

--------------
[PATCH] PR remote/19496, timeout in forking-threads-plus-bkpt

This patch addresses a failure in
gdb.threads/forking-threads-plus-breakpoint.exp:

FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
detach_on_fork=on: inferior 1 exited (timeout)

Cause:

A fork event was reported to GDB before GDB knew about the parent
thread, followed immediately by a breakpoint event in a different
thread.  The parent thread was subsequently added via
remote_notice_new_inferior in process_stop_reply, but when the thread
was added the thread_info.state was set to THREAD_STOPPED.  The fork
event was then handled correctly, but when the fork parent was resumed
via a call to keep_going, the state was unchanged.

The breakpoint event was then handled, which caused all the
non-breakpoint threads to be stopped.  When the breakpoint thread was
resumed, all the non-breakpoint threads were resumed via
infrun.c:restart_threads.  Our old fork parent wasn't restarted,
because it still had thread_info.state set to THREAD_STOPPED.
Ultimately the program under debug hung waiting for a pthread_join
while the old fork parent was stopped forever by GDB.

Fix:

Since this is non-stop, then the bug is that the thread should have
been added in THREAD_RUNNING state.  Consider that infrun may be
pulling target events out of the target_ops backend into its own event
queue, but, not process them immediately.  E.g., infrun may be
stopping all threads temporarily for a step-over-breakpoint operation
for thread A (stop_all_threads).  The waitstatus of all threads is
thus left pending in the thread structure (save_status), including the
fork event of thread B.  Right at this point, if the user does "info
threads", that should show thread B (the fork parent) running, not
stopped, even if internally, gdb is holding it paused for a little
bit.

Thus if in non-stop mode, always add new threads in the external
user-visible THREAD_RUNNING state.  Change remote_notice_new_inferior
to accept the internal executing state of the thread instead, with
EXECUTING set to 1 when we discover a thread that is running on the
target (such as through remote_update_thread_list), and 0 when the
thread is really paused (such as when we see a stop reply).

Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.

gdb/ChangeLog:
2016-03-15  Pedro Alves  <palves@redhat.com>
	    Don Breazeal  <donb@codesourcery.com>

	* infcmd.c (notice_new_inferior): Use the 'leave_running' argument
	instead of checking the 'non_stop' global.

	* remote.c (remote_add_thread): New parameter 'executing'.  Use it
	to set the new thread's executing state.
	(remote_notice_new_inferior): Rename parameter 'running' to
	'executing'.  Always set the thread state to THREAD_RUNNING in
	non-stop mode, and to THREAD_STOPPED in all-stop mode.  Pass
	EXECUTING to remote_add_thread and notice_new_inferior.
	(remote_update_thread_list): Update to pass executing state, not
	running state.
---
 gdb/infcmd.c |  7 +------
 gdb/remote.c | 30 ++++++++++++++++++------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index a80bf0a..d687116 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -2903,11 +2903,7 @@ notice_new_inferior (ptid_t ptid, int leave_running, int from_tty)
 
   old_chain = make_cleanup (null_cleanup, NULL);
 
-  /* If in non-stop, leave threads as running as they were.  If
-     they're stopped for some reason other than us telling it to, the
-     target reports a signal != GDB_SIGNAL_0.  We don't try to
-     resume threads with such a stop signal.  */
-  mode = non_stop ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
+  mode = leave_running ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
 
   if (!ptid_equal (inferior_ptid, null_ptid))
     make_cleanup_restore_current_thread ();
@@ -2943,7 +2939,6 @@ notice_new_inferior (ptid_t ptid, int leave_running, int from_tty)
       return;
     }
 
-  mode = leave_running ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
   attach_post_wait ("" /* args */, from_tty, mode);
 
   do_cleanups (old_chain);
diff --git a/gdb/remote.c b/gdb/remote.c
index f09a06e..af0a08a 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -1801,7 +1801,7 @@ remote_add_inferior (int fake_pid_p, int pid, int attached,
    according to RUNNING.  */
 
 static void
-remote_add_thread (ptid_t ptid, int running)
+remote_add_thread (ptid_t ptid, int running, int executing)
 {
   struct remote_state *rs = get_remote_state ();
 
@@ -1816,7 +1816,7 @@ remote_add_thread (ptid_t ptid, int running)
   else
     add_thread (ptid);
 
-  set_executing (ptid, running);
+  set_executing (ptid, executing);
   set_running (ptid, running);
 }
 
@@ -1824,11 +1824,17 @@ remote_add_thread (ptid_t ptid, int running)
    It may be the first time we hear about such thread, so take the
    opportunity to add it to GDB's thread list.  In case this is the
    first time we're noticing its corresponding inferior, add it to
-   GDB's inferior list as well.  */
+   GDB's inferior list as well.  EXECUTING indicates whether the
+   thread is (internally) executing or stopped.  */
 
 static void
-remote_notice_new_inferior (ptid_t currthread, int running)
+remote_notice_new_inferior (ptid_t currthread, int executing)
 {
+  /* In non-stop mode, we assume new found threads are (externally)
+     running until proven otherwise with a stop reply.  In all-stop,
+     we can only get here if all threads are stopped.  */
+  int running = target_is_non_stop_p () ? 1 : 0;
+
   /* If this is a new thread, add it to GDB's thread list.
      If we leave it up to WFI to do this, bad things will happen.  */
 
@@ -1836,7 +1842,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
     {
       /* We're seeing an event on a thread id we knew had exited.
 	 This has to be a new thread reusing the old id.  Add it.  */
-      remote_add_thread (currthread, running);
+      remote_add_thread (currthread, running, executing);
       return;
     }
 
@@ -1857,7 +1863,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
 	    thread_change_ptid (inferior_ptid, currthread);
 	  else
 	    {
-	      remote_add_thread (currthread, running);
+	      remote_add_thread (currthread, running, executing);
 	      inferior_ptid = currthread;
 	    }
 	  return;
@@ -1888,7 +1894,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
 	}
 
       /* This is really a new thread.  Add it.  */
-      remote_add_thread (currthread, running);
+      remote_add_thread (currthread, running, executing);
 
       /* If we found a new inferior, let the common code do whatever
 	 it needs to with it (e.g., read shared libraries, insert
@@ -1899,7 +1905,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
 	  struct remote_state *rs = get_remote_state ();
 
 	  if (!rs->starting_up)
-	    notice_new_inferior (currthread, running, 0);
+	    notice_new_inferior (currthread, executing, 0);
 	}
     }
 }
@@ -3259,12 +3265,12 @@ remote_update_thread_list (struct target_ops *ops)
 	    {
 	      struct private_thread_info *info;
 	      /* In non-stop mode, we assume new found threads are
-		 running until proven otherwise with a stop reply.  In
-		 all-stop, we can only get here if all threads are
+		 executing until proven otherwise with a stop reply.
+		 In all-stop, we can only get here if all threads are
 		 stopped.  */
-	      int running = target_is_non_stop_p () ? 1 : 0;
+	      int executing = target_is_non_stop_p () ? 1 : 0;
 
-	      remote_notice_new_inferior (item->ptid, running);
+	      remote_notice_new_inferior (item->ptid, executing);
 
 	      info = demand_private_info (item->ptid);
 	      info->core = item->core;
-- 
2.5.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-03-15 15:30           ` Pedro Alves
@ 2016-03-16 17:29             ` Don Breazeal
  2016-03-16 22:51               ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-03-16 17:29 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 3/15/2016 8:29 AM, Pedro Alves wrote:
> Hi Don,
> 
> Sorry for the delay.  I had to find a moment to think this through.
> 
> On 02/11/2016 12:28 AM, Don Breazeal wrote:
>> On 2/1/2016 12:09 PM, Pedro Alves wrote:
>>> On 02/01/2016 07:29 PM, Don Breazeal wrote:
>>>> On 2/1/2016 4:05 AM, Pedro Alves wrote:
>>>
>>>> Hi Pedro,
>> ---snip---
>>>> A fork event was reported to GDB before GDB knew about the parent thread,
>>>> followed immediately by a breakpoint event in a different thread.  The
>>>> parent thread was subsequently added via remote_notice_new_inferior in
>>>> process_stop_reply, but when the thread was added the thread_info.state
>>>> was set to THREAD_STOPPED.  The fork event was then handled correctly,
>>>> but when the fork parent was resumed via a call to keep_going, the state
>>>> was unchanged.
>>>
>>> Since this is non-stop, then it sounds to me like the bug is that the
>>> thread should have been added in THREAD_RUNNING state.
>>>
>>> Consider that infrun may be pulling target events out of the target_ops
>>> backend into its own event queue, but, not process them immediately.
>>>
>>> E.g., infrun may be stopping all threads temporarily for a step-over-breakpoint
>>> operation for thread A (stop_all_threads).  The waitstatus of all threads
>>> is thus left pending in the thread structure (save_status), including the
>>> fork event of thread B.  Right at this point, if the user
>>> does "info threads", that should show thread B (the fork parent) running,
>>> not stopped, even if internally, gdb is holding it paused for a little bit.
>>>
>>
>> Hi Pedro,
>> Here is a new patch that adds the threads with the state set to
>> THREAD_RUNNING for fork events.
> 
> I really meant it for _all_ kinds of events, not just fork.

Oh.  Oops.

> 
> I don't think we can do this:
> 
> +      if (status->kind == TARGET_WAITKIND_FORKED
> +	  || status->kind == TARGET_WAITKIND_VFORKED)
> +	remote_notice_new_inferior (ptid, 1);
> 
> in all-stop mode, as passing 1 means we'll end up installing a
> continuation that immediately resumes the whole process.

Huh.  Reading through the code I see this now, but don't understand why
I didn't run into any problems in my testing.  I guess I can go back and
do some debugging to clarify that for myself.

> 
> +      else
> +       remote_notice_new_inferior (ptid, 0);
> 
> and with this, if in non-stop, and the event ends up pending
> on the infrun side, "info threads" will be confused for that
> thread too.

I get this part now.

> 
> So the fix requires a bit of plumbing to pass down the correct
> external and internal thread states.  Something like the patch
> below.
> 
> WDYT?  Does it fix the timeout you observe?

This does fix the timeout, and it makes sense.

Nit: blank line in ChangeLog.
> 
> (We should really come up with better, less confusing names for the
> "running" vs "executing" distinction.  Maybe external/internal running
> state, or user/internal, or public/private...)

I like user/internal, like "user_run_state" and "internal_run_state".
But I never won any prizes for my ability to name things. :-S

Along the same lines, I don't get the distinction between
thread_info.resumed and thread_info.state==THREAD_RUNNING.  It seems
like they both mean that the thread is running from the user perspective.

I'm running the full GDB testsuite on the Nios II target to see if
anything else pops up.  I'll let you know the results of that.

Thanks for the review and the rework.
--Don

> 
> --------------
> [PATCH] PR remote/19496, timeout in forking-threads-plus-bkpt
> 
> This patch addresses a failure in
> gdb.threads/forking-threads-plus-breakpoint.exp:
> 
> FAIL: gdb.threads/forking-threads-plus-breakpoint.exp: cond_bp_target=1:
> detach_on_fork=on: inferior 1 exited (timeout)
> 
> Cause:
> 
> A fork event was reported to GDB before GDB knew about the parent
> thread, followed immediately by a breakpoint event in a different
> thread.  The parent thread was subsequently added via
> remote_notice_new_inferior in process_stop_reply, but when the thread
> was added the thread_info.state was set to THREAD_STOPPED.  The fork
> event was then handled correctly, but when the fork parent was resumed
> via a call to keep_going, the state was unchanged.
> 
> The breakpoint event was then handled, which caused all the
> non-breakpoint threads to be stopped.  When the breakpoint thread was
> resumed, all the non-breakpoint threads were resumed via
> infrun.c:restart_threads.  Our old fork parent wasn't restarted,
> because it still had thread_info.state set to THREAD_STOPPED.
> Ultimately the program under debug hung waiting for a pthread_join
> while the old fork parent was stopped forever by GDB.
> 
> Fix:
> 
> Since this is non-stop, then the bug is that the thread should have
> been added in THREAD_RUNNING state.  Consider that infrun may be
> pulling target events out of the target_ops backend into its own event
> queue, but, not process them immediately.  E.g., infrun may be
> stopping all threads temporarily for a step-over-breakpoint operation
> for thread A (stop_all_threads).  The waitstatus of all threads is
> thus left pending in the thread structure (save_status), including the
> fork event of thread B.  Right at this point, if the user does "info
> threads", that should show thread B (the fork parent) running, not
> stopped, even if internally, gdb is holding it paused for a little
> bit.
> 
> Thus if in non-stop mode, always add new threads in the external
> user-visible THREAD_RUNNING state.  Change remote_notice_new_inferior
> to accept the internal executing state of the thread instead, with
> EXECUTING set to 1 when we discover a thread that is running on the
> target (such as through remote_update_thread_list), and 0 when the
> thread is really paused (such as when we see a stop reply).
> 
> Tested on x86_64 Linux and Nios II Linux target with x86 Linux host.
> 
> gdb/ChangeLog:
> 2016-03-15  Pedro Alves  <palves@redhat.com>
> 	    Don Breazeal  <donb@codesourcery.com>
> 
> 	* infcmd.c (notice_new_inferior): Use the 'leave_running' argument
> 	instead of checking the 'non_stop' global.
> 
> 	* remote.c (remote_add_thread): New parameter 'executing'.  Use it
> 	to set the new thread's executing state.
> 	(remote_notice_new_inferior): Rename parameter 'running' to
> 	'executing'.  Always set the thread state to THREAD_RUNNING in
> 	non-stop mode, and to THREAD_STOPPED in all-stop mode.  Pass
> 	EXECUTING to remote_add_thread and notice_new_inferior.
> 	(remote_update_thread_list): Update to pass executing state, not
> 	running state.
> ---
>  gdb/infcmd.c |  7 +------
>  gdb/remote.c | 30 ++++++++++++++++++------------
>  2 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/gdb/infcmd.c b/gdb/infcmd.c
> index a80bf0a..d687116 100644
> --- a/gdb/infcmd.c
> +++ b/gdb/infcmd.c
> @@ -2903,11 +2903,7 @@ notice_new_inferior (ptid_t ptid, int leave_running, int from_tty)
>  
>    old_chain = make_cleanup (null_cleanup, NULL);
>  
> -  /* If in non-stop, leave threads as running as they were.  If
> -     they're stopped for some reason other than us telling it to, the
> -     target reports a signal != GDB_SIGNAL_0.  We don't try to
> -     resume threads with such a stop signal.  */
> -  mode = non_stop ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
> +  mode = leave_running ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
>  
>    if (!ptid_equal (inferior_ptid, null_ptid))
>      make_cleanup_restore_current_thread ();
> @@ -2943,7 +2939,6 @@ notice_new_inferior (ptid_t ptid, int leave_running, int from_tty)
>        return;
>      }
>  
> -  mode = leave_running ? ATTACH_POST_WAIT_RESUME : ATTACH_POST_WAIT_NOTHING;
>    attach_post_wait ("" /* args */, from_tty, mode);
>  
>    do_cleanups (old_chain);
> diff --git a/gdb/remote.c b/gdb/remote.c
> index f09a06e..af0a08a 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -1801,7 +1801,7 @@ remote_add_inferior (int fake_pid_p, int pid, int attached,
>     according to RUNNING.  */
>  
>  static void
> -remote_add_thread (ptid_t ptid, int running)
> +remote_add_thread (ptid_t ptid, int running, int executing)
>  {
>    struct remote_state *rs = get_remote_state ();
>  
> @@ -1816,7 +1816,7 @@ remote_add_thread (ptid_t ptid, int running)
>    else
>      add_thread (ptid);
>  
> -  set_executing (ptid, running);
> +  set_executing (ptid, executing);
>    set_running (ptid, running);
>  }
>  
> @@ -1824,11 +1824,17 @@ remote_add_thread (ptid_t ptid, int running)
>     It may be the first time we hear about such thread, so take the
>     opportunity to add it to GDB's thread list.  In case this is the
>     first time we're noticing its corresponding inferior, add it to
> -   GDB's inferior list as well.  */
> +   GDB's inferior list as well.  EXECUTING indicates whether the
> +   thread is (internally) executing or stopped.  */
>  
>  static void
> -remote_notice_new_inferior (ptid_t currthread, int running)
> +remote_notice_new_inferior (ptid_t currthread, int executing)
>  {
> +  /* In non-stop mode, we assume new found threads are (externally)
> +     running until proven otherwise with a stop reply.  In all-stop,
> +     we can only get here if all threads are stopped.  */
> +  int running = target_is_non_stop_p () ? 1 : 0;
> +
>    /* If this is a new thread, add it to GDB's thread list.
>       If we leave it up to WFI to do this, bad things will happen.  */
>  
> @@ -1836,7 +1842,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
>      {
>        /* We're seeing an event on a thread id we knew had exited.
>  	 This has to be a new thread reusing the old id.  Add it.  */
> -      remote_add_thread (currthread, running);
> +      remote_add_thread (currthread, running, executing);
>        return;
>      }
>  
> @@ -1857,7 +1863,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
>  	    thread_change_ptid (inferior_ptid, currthread);
>  	  else
>  	    {
> -	      remote_add_thread (currthread, running);
> +	      remote_add_thread (currthread, running, executing);
>  	      inferior_ptid = currthread;
>  	    }
>  	  return;
> @@ -1888,7 +1894,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
>  	}
>  
>        /* This is really a new thread.  Add it.  */
> -      remote_add_thread (currthread, running);
> +      remote_add_thread (currthread, running, executing);
>  
>        /* If we found a new inferior, let the common code do whatever
>  	 it needs to with it (e.g., read shared libraries, insert
> @@ -1899,7 +1905,7 @@ remote_notice_new_inferior (ptid_t currthread, int running)
>  	  struct remote_state *rs = get_remote_state ();
>  
>  	  if (!rs->starting_up)
> -	    notice_new_inferior (currthread, running, 0);
> +	    notice_new_inferior (currthread, executing, 0);
>  	}
>      }
>  }
> @@ -3259,12 +3265,12 @@ remote_update_thread_list (struct target_ops *ops)
>  	    {
>  	      struct private_thread_info *info;
>  	      /* In non-stop mode, we assume new found threads are
> -		 running until proven otherwise with a stop reply.  In
> -		 all-stop, we can only get here if all threads are
> +		 executing until proven otherwise with a stop reply.
> +		 In all-stop, we can only get here if all threads are
>  		 stopped.  */
> -	      int running = target_is_non_stop_p () ? 1 : 0;
> +	      int executing = target_is_non_stop_p () ? 1 : 0;
>  
> -	      remote_notice_new_inferior (item->ptid, running);
> +	      remote_notice_new_inferior (item->ptid, executing);
>  
>  	      info = demand_private_info (item->ptid);
>  	      info->core = item->core;
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-03-15 12:55           ` Pedro Alves
@ 2016-03-16 18:26             ` Don Breazeal
  2016-03-16 18:33               ` Pedro Alves
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-03-16 18:26 UTC (permalink / raw)
  To: gdb-patches, palves

On 3/15/2016 5:55 AM, Pedro Alves wrote:
> Hi Don,
> 
> On 02/11/2016 12:28 AM, Don Breazeal wrote:
> 
---snip---
> 
> Use with_timeout_factor instead so that the timeout is properly restored,
> and put it around the problematic test, only, instead of basically
> around the whole test case.  I think that'll be the "inferior 1 exited"
> test?

Thanks Pedro.  This is done in the patch below.  OK to push?
--Don

This patch addresses "fork:Interrupted system call" (or wait:) failures
in gdb.threads/forking-threads-plus-breakpoint.exp.

The test program spawns ten threads, each of which do ten fork/waitpid
sequences.  The cause of the problem was that when one of the fork
children exited before the corresponding fork parent could initiate its
waitpid for that child, a SIGCHLD and/or SIGSTOP was delivered and
interrupted a fork or waitpid in another thread.

The fix was to wrap the system calls in a loop to retry the call if
it was interrupted, like:

do
  {
    pid = fork ();
  }
while (pid == -1 && errno == EINTR);

Since this is a Linux-only test I figure it is OK to use errno and EINTR.
I tried a number of alternative fixes using SIG_IGN, SA_RESTART,
pthread_sigblock, and bsd_signal, but none of these worked as well.

Tested on Nios II Linux target with x86 Linux host.

gdb/testsuite/ChangeLog:
2016-03-16  Don Breazeal  <donb@codesourcery.com>

	* gdb.threads/forking-threads-plus-breakpoint.c (thread_forks):
	Retry fork and waitpid on interrupted system call errors.
	* gdb.threads/forking-threads-plus-breakpoint.exp: (do_test):
	Use with_timeout_factor to increase timeout to 90.

---
 .../gdb.threads/forking-threads-plus-breakpoint.c  |   14 +++++++++-
 .../forking-threads-plus-breakpoint.exp            |   27 +++++++++++---------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
index fc64d93..c169e18 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c
@@ -22,6 +22,7 @@
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <stdlib.h>
+#include <errno.h>
 
 /* Number of threads.  Each thread continuously spawns a fork and wait
    for it.  If we have another thread continuously start a step over,
@@ -49,14 +50,23 @@ thread_forks (void *arg)
     {
       pid_t pid;
 
-      pid = fork ();
+      do
+	{
+	  pid = fork ();
+	}
+      while (pid == -1 && errno == EINTR);
 
       if (pid > 0)
 	{
 	  int status;
 
 	  /* Parent.  */
-	  pid = waitpid (pid, &status, 0);
+	  do
+	    {
+	      pid = waitpid (pid, &status, 0);
+	    }
+	  while (pid == -1 && errno == EINTR);
+
 	  if (pid == -1)
 	    {
 	      perror ("wait");
diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
index 3d8b308..9c700bf 100644
--- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
+++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
@@ -72,6 +72,7 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
     global decimal gdb_prompt
     global linenum
     global is_remote_target
+    global timeout
 
     set saved_gdbflags $GDBFLAGS
     set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
@@ -115,18 +116,20 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
     set fork_count 0
     set ok 0
 
-    set test "inferior 1 exited"
-    gdb_test_multiple "" $test {
-	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
-	    set ok 1
-	    pass $test
-	}
-	-re "Inferior $decimal \(\[^\r\n\]+\) exited normally" {
-	    incr fork_count
-	    if {$fork_count <= 100} {
-		exp_continue
-	    } else {
-		fail "$test (too many forks)"
+    with_timeout_factor 90 {
+        set test "inferior 1 exited"
+        gdb_test_multiple "" $test {
+	    -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
+	        set ok 1
+	        pass $test
+	    }
+	    -re "Inferior $decimal \(\[^\r\n\]+\) exited normally" {
+	        incr fork_count
+	        if {$fork_count <= 100} {
+		    exp_continue
+	        } else {
+		    fail "$test (too many forks)"
+	        }
 	    }
 	}
     }
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-03-16 18:26             ` Don Breazeal
@ 2016-03-16 18:33               ` Pedro Alves
  2016-03-16 22:18                 ` Don Breazeal
  0 siblings, 1 reply; 29+ messages in thread
From: Pedro Alves @ 2016-03-16 18:33 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 03/16/2016 06:25 PM, Don Breazeal wrote:

> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
> @@ -72,6 +72,7 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>       global decimal gdb_prompt
>       global linenum
>       global is_remote_target
> +    global timeout

Don't need this.

>
>       set saved_gdbflags $GDBFLAGS
>       set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
> @@ -115,18 +116,20 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>       set fork_count 0
>       set ok 0
>
> -    set test "inferior 1 exited"
> -    gdb_test_multiple "" $test {
> -	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
> -	    set ok 1
> -	    pass $test
> -	}
> -	-re "Inferior $decimal \(\[^\r\n\]+\) exited normally" {
> -	    incr fork_count
> -	    if {$fork_count <= 100} {
> -		exp_continue
> -	    } else {
> -		fail "$test (too many forks)"
> +    with_timeout_factor 90 {

Eeeek.  :-)  This is a factor, not absolute time.

The default is 10 seconds, so a factor of 9 or 10
should suffice.  I'd say make it 10, just because it is
round and matches the number of threads.

Otherwise OK.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt
  2016-03-16 18:33               ` Pedro Alves
@ 2016-03-16 22:18                 ` Don Breazeal
  0 siblings, 0 replies; 29+ messages in thread
From: Don Breazeal @ 2016-03-16 22:18 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 3/16/2016 11:33 AM, Pedro Alves wrote:
> On 03/16/2016 06:25 PM, Don Breazeal wrote:
> 
>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp
>> @@ -72,6 +72,7 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>>       global decimal gdb_prompt
>>       global linenum
>>       global is_remote_target
>> +    global timeout
> 
> Don't need this.
> 
>>
>>       set saved_gdbflags $GDBFLAGS
>>       set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""]
>> @@ -115,18 +116,20 @@ proc do_test { cond_bp_target detach_on_fork displaced } {
>>       set fork_count 0
>>       set ok 0
>>
>> -    set test "inferior 1 exited"
>> -    gdb_test_multiple "" $test {
>> -	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>> -	    set ok 1
>> -	    pass $test
>> -	}
>> -	-re "Inferior $decimal \(\[^\r\n\]+\) exited normally" {
>> -	    incr fork_count
>> -	    if {$fork_count <= 100} {
>> -		exp_continue
>> -	    } else {
>> -		fail "$test (too many forks)"
>> +    with_timeout_factor 90 {
> 
> Eeeek.  :-)  This is a factor, not absolute time.
> 
> The default is 10 seconds, so a factor of 9 or 10
> should suffice.  I'd say make it 10, just because it is
> round and matches the number of threads.
> 
> Otherwise OK.
> 
> Thanks,
> Pedro Alves
> 
Thanks Pedro.  This is now pushed, with changes.
--Don

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-03-16 17:29             ` Don Breazeal
@ 2016-03-16 22:51               ` Don Breazeal
  2016-03-17 10:38                 ` Pedro Alves
  0 siblings, 1 reply; 29+ messages in thread
From: Don Breazeal @ 2016-03-16 22:51 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 3/16/2016 10:28 AM, Don Breazeal wrote:
> On 3/15/2016 8:29 AM, Pedro Alves wrote:
--snip--
> I'm running the full GDB testsuite on the Nios II target to see if
> anything else pops up.  I'll let you know the results of that.
> 
Tests on Nios II Linux, x86 host showed no regressions with this patch.

Thanks again,
--Don

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] PR remote/19496, timeout in forking-threads-plus-bkpt
  2016-03-16 22:51               ` Don Breazeal
@ 2016-03-17 10:38                 ` Pedro Alves
  0 siblings, 0 replies; 29+ messages in thread
From: Pedro Alves @ 2016-03-17 10:38 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 03/16/2016 10:51 PM, Don Breazeal wrote:
> On 3/16/2016 10:28 AM, Don Breazeal wrote:
>> On 3/15/2016 8:29 AM, Pedro Alves wrote:
> --snip--
>> I'm running the full GDB testsuite on the Nios II target to see if
>> anything else pops up.  I'll let you know the results of that.
>>
> Tests on Nios II Linux, x86 host showed no regressions with this patch.

Excellent, I've removed the spurious empty line in the ChangeLog
entry and pushed it in.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-03-17 10:38 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28  0:48 [PATCH 0/3] PR remote/19496, remote fork failures Don Breazeal
2016-01-28  0:48 ` [PATCH 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Don Breazeal
2016-02-01 19:38   ` Pedro Alves
2016-01-28  0:48 ` [PATCH 3/3] PR remote/19496, timeout " Don Breazeal
2016-02-01 12:05   ` Pedro Alves
2016-02-01 19:29     ` Don Breazeal
2016-02-01 20:09       ` Pedro Alves
2016-02-11  0:28         ` [PATCH v2 2/3] PR remote/19496, interrupted syscall " Don Breazeal
2016-02-25 17:28           ` Don Breazeal
2016-03-03 18:19             ` [PING] " Don Breazeal
2016-03-14 21:29               ` Don Breazeal
2016-03-15 12:55           ` Pedro Alves
2016-03-16 18:26             ` Don Breazeal
2016-03-16 18:33               ` Pedro Alves
2016-03-16 22:18                 ` Don Breazeal
2016-02-11  0:28         ` [PATCH v2 3/3] PR remote/19496, timeout " Don Breazeal
2016-02-25 17:29           ` [PING]Re: " Don Breazeal
2016-03-03 18:20             ` [PING] " Don Breazeal
2016-03-14 21:30               ` Don Breazeal
2016-03-15 15:30           ` Pedro Alves
2016-03-16 17:29             ` Don Breazeal
2016-03-16 22:51               ` Don Breazeal
2016-03-17 10:38                 ` Pedro Alves
2016-01-28  0:48 ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Don Breazeal
2016-02-01 19:26   ` [pushed] Test gdb.threads/forking-threads-plus-breakpoint.exp with, displaced stepping off (Re: [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt) Pedro Alves
2016-02-01 20:38   ` [PATCH 1/3] PR remote/19496, internal err forking-threads-plus-bkpt Pedro Alves
2016-02-11  0:26     ` [PATCH v2 " Don Breazeal
2016-02-12 20:15       ` Pedro Alves
2016-02-16 17:21         ` Don Breazeal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).