[PATCH] Fix gdbserver handling of just-exited threads (gdb.threads/interrupted-hand-call.exp race)

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Fix gdbserver handling of just-exited threads (gdb.threads/interrupted-hand-call.exp race)
@ 2021-11-09 18:47 Pedro Alves
  2021-11-13  2:35 ` Simon Marchi
  0 siblings, 1 reply; 2+ messages in thread
From: Pedro Alves @ 2021-11-09 18:47 UTC (permalink / raw)
  To: gdb-patches

When testing against gdbserver, gdb.threads/interrupted-hand-call.exp
occasionally fails like this:

  FAIL: gdb.threads/interrupted-hand-call.exp: continue (timeout)

Hacking-in "set debug infrun" logs we see:

   [infrun] fetch_inferior_event: enter
     [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
     [infrun] print_target_wait_results:   -1.0.0 [process -1],
     [infrun] print_target_wait_results:   status->kind = no-resumed
     [infrun] handle_inferior_event: status->kind = no-resumed
     [infrun] handle_no_resumed: TARGET_WAITKIND_NO_RESUMED (ignoring: found resumed)
     [infrun] prepare_to_wait: prepare_to_wait
   [infrun] fetch_inferior_event: exit
   FAIL: gdb.threads/interrupted-hand-call.exp: continue (timeout)
   Remote debugging from host ::1, port 35162
   Process /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/interrupted-hand-call/interrupted-hand-call created; pid = 3508901

"set debug remote" logs show:

   [remote] Sending packet: $vCont;c:p398f62.-1#50
   [remote] wait: enter
     [remote] Packet received: N
   [remote] wait: exit

And on gdbserver, we see:

  >>>> entering ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
  wait_1: [<all threads>]
  LWFE: waitpid(-1, ...) returned 480013, ERRNO-OK
  LLW: waitpid 480013 received 0 (exited)
  LLFE: 480013 exited.
  deleting 480013
  LWFE: waitpid(-1, ...) returned 480011, ERRNO-OK
  LLW: waitpid 480011 received 0 (exited)
  LLFE: 480011 exited.
  deleting 480011
  LWFE: waitpid(-1, ...) returned 480010, ERRNO-OK
  LLW: waitpid 480010 received 0 (exited)
  LLFE: 480010 exited.
  deleting 480010
  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
  RSRL: resuming stopped-resumed LWP LWP 480009.480012 at 7ffff7f96376: step=0
    continue from pc 0x7ffff7f96376
  Resuming lwp 480012 (continue, signal 0, stop not expected)
  leader_pid=480009, leader_lp!=NULL=1, num_lwps=2, zombie=1
  CZL: Thread group leader 480009 zombie (it exited, or another thread execd).
  deleting 480009
  LLW: exit (no unwaited-for LWP)
  wait_1 ret = null_ptid, TARGET_WAITKIND_NO_RESUMED
  <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
  Writing resume reply for <null thread>:13
  putpkt ("$N#4e"); [noack mode]

Note gdbserver noticed the leader exited, and deleted it.  And then
returned TARGET_WAITKIND_NO_RESUMED.  But a thread had just been
resumed, so how come gdbserver doesn't know there are resumed threads?

The answer is that lwp 480012 exits (because some other thread killed
the whole process) just while gdbserver was resuming it, in
linux_process_target::resume_one_lwp_throw.  That function throws an
error before marking the lwp/thread as not-stopped.

If we try to resume a stopped thread, and we get back an ESRCH, the
best is to just pretend that it resumed successfully (like we already
do in other places), and let the wait path collect the exit code.
This is what this patch does.

Change-Id: I9ee17afe323d33c03d6d220df76304e4256749c9
---
 gdbserver/linux-low.cc | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index 34ede238d19..2ab7a177d3f 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -4253,8 +4253,19 @@ linux_process_target::resume_one_lwp_throw (lwp_info *lwp, int step,
 	  (PTRACE_TYPE_ARG4) (uintptr_t) signal);
 
   current_thread = saved_thread;
-  if (errno)
-    perror_with_name ("resuming thread");
+
+  /* If we try to resume a stopped thread, and we get back an ESRCH,
+     it means the thread is gone.  In that case, pretend that the
+     thread was resumed successfully, and let the wait path collect
+     the exit code.  */
+  if (errno && errno != ESRCH)
+    {
+      if (debug_threads)
+	debug_printf ("Resuming lwp %ld failed with %d: %s\n",
+		      lwpid_of (thread),
+		      errno, strerror (errno));
+      perror_with_name ("resuming thread");
+    }
 
   /* Successfully resumed.  Clear state that no longer makes sense,
      and mark the LWP as running.  Must not do this before resuming

base-commit: f0bbba7886f5dba158a143bebbd0691591f22b9f
-- 
2.26.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Fix gdbserver handling of just-exited threads (gdb.threads/interrupted-hand-call.exp race)
  2021-11-09 18:47 [PATCH] Fix gdbserver handling of just-exited threads (gdb.threads/interrupted-hand-call.exp race) Pedro Alves
@ 2021-11-13  2:35 ` Simon Marchi
  0 siblings, 0 replies; 2+ messages in thread
From: Simon Marchi @ 2021-11-13  2:35 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-11-09 13:47, Pedro Alves wrote:
> When testing against gdbserver, gdb.threads/interrupted-hand-call.exp
> occasionally fails like this:
> 
>   FAIL: gdb.threads/interrupted-hand-call.exp: continue (timeout)
> 
> Hacking-in "set debug infrun" logs we see:
> 
>    [infrun] fetch_inferior_event: enter
>      [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>      [infrun] print_target_wait_results:   -1.0.0 [process -1],
>      [infrun] print_target_wait_results:   status->kind = no-resumed
>      [infrun] handle_inferior_event: status->kind = no-resumed
>      [infrun] handle_no_resumed: TARGET_WAITKIND_NO_RESUMED (ignoring: found resumed)
>      [infrun] prepare_to_wait: prepare_to_wait
>    [infrun] fetch_inferior_event: exit
>    FAIL: gdb.threads/interrupted-hand-call.exp: continue (timeout)
>    Remote debugging from host ::1, port 35162
>    Process /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/interrupted-hand-call/interrupted-hand-call created; pid = 3508901
> 
> "set debug remote" logs show:
> 
>    [remote] Sending packet: $vCont;c:p398f62.-1#50
>    [remote] wait: enter
>      [remote] Packet received: N
>    [remote] wait: exit
> 
> And on gdbserver, we see:
> 
>   >>>> entering ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
>   wait_1: [<all threads>]
>   LWFE: waitpid(-1, ...) returned 480013, ERRNO-OK
>   LLW: waitpid 480013 received 0 (exited)
>   LLFE: 480013 exited.
>   deleting 480013
>   LWFE: waitpid(-1, ...) returned 480011, ERRNO-OK
>   LLW: waitpid 480011 received 0 (exited)
>   LLFE: 480011 exited.
>   deleting 480011
>   LWFE: waitpid(-1, ...) returned 480010, ERRNO-OK
>   LLW: waitpid 480010 received 0 (exited)
>   LLFE: 480010 exited.
>   deleting 480010
>   LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
>   RSRL: resuming stopped-resumed LWP LWP 480009.480012 at 7ffff7f96376: step=0
>     continue from pc 0x7ffff7f96376
>   Resuming lwp 480012 (continue, signal 0, stop not expected)
>   leader_pid=480009, leader_lp!=NULL=1, num_lwps=2, zombie=1
>   CZL: Thread group leader 480009 zombie (it exited, or another thread execd).
>   deleting 480009
>   LLW: exit (no unwaited-for LWP)
>   wait_1 ret = null_ptid, TARGET_WAITKIND_NO_RESUMED
>   <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
>   Writing resume reply for <null thread>:13
>   putpkt ("$N#4e"); [noack mode]
> 
> Note gdbserver noticed the leader exited, and deleted it.  And then
> returned TARGET_WAITKIND_NO_RESUMED.  But a thread had just been
> resumed, so how come gdbserver doesn't know there are resumed threads?
> 
> The answer is that lwp 480012 exits (because some other thread killed
> the whole process) just while gdbserver was resuming it, in
> linux_process_target::resume_one_lwp_throw.  That function throws an
> error before marking the lwp/thread as not-stopped.
> 
> If we try to resume a stopped thread, and we get back an ESRCH, the
> best is to just pretend that it resumed successfully (like we already
> do in other places), and let the wait path collect the exit code.
> This is what this patch does.
> 
> Change-Id: I9ee17afe323d33c03d6d220df76304e4256749c9
> ---
>  gdbserver/linux-low.cc | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
> index 34ede238d19..2ab7a177d3f 100644
> --- a/gdbserver/linux-low.cc
> +++ b/gdbserver/linux-low.cc
> @@ -4253,8 +4253,19 @@ linux_process_target::resume_one_lwp_throw (lwp_info *lwp, int step,
>  	  (PTRACE_TYPE_ARG4) (uintptr_t) signal);
>  
>    current_thread = saved_thread;
> -  if (errno)
> -    perror_with_name ("resuming thread");
> +
> +  /* If we try to resume a stopped thread, and we get back an ESRCH,
> +     it means the thread is gone.  In that case, pretend that the
> +     thread was resumed successfully, and let the wait path collect
> +     the exit code.  */
> +  if (errno && errno != ESRCH)
> +    {
> +      if (debug_threads)
> +	debug_printf ("Resuming lwp %ld failed with %d: %s\n",
> +		      lwpid_of (thread),
> +		      errno, strerror (errno));
> +      perror_with_name ("resuming thread");
> +    }

Use "errno != 0", since it's an integer?

I would maybe add a logging statement in the "errno != 0 && errno !=
ESRCH" case.  But otherwise, the patch LGTM.

Simon

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-11-13  2:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-09 18:47 [PATCH] Fix gdbserver handling of just-exited threads (gdb.threads/interrupted-hand-call.exp race) Pedro Alves
2021-11-13  2:35 ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).