public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: gdb-patches@sourceware.org
Subject: [PATCH 08/11] Re-add zombie leader on exit, gdbserver/linux
Date: Thu,  3 Mar 2022 14:40:17 +0000	[thread overview]
Message-ID: <20220303144020.3601082-9-pedro@palves.net> (raw)
In-Reply-To: <20220303144020.3601082-1-pedro@palves.net>

Same as the previous patch, but for GDBserver.

In summary, the current zombie leader detection code in linux-low.cc
has a race -- if a multi-threaded inferior exits just before
check_zombie_leaders finds that the leader is now zombie via checking
/proc/PID/status, check_zombie_leaders deletes the leader, assuming we
won't get an event for that exit (which we won't in some scenarios,
but not in this one), which is a false-positive scenario, where the
whole process is simply exiting.  Later when we see the last LWP in
our list exit, we report that LWP's exit status as exit code, even
though for the (real) parent process, the exit code that counts is the
child's leader thread's exit code.

Like for GDB, the solution here is to:

   - only report whole-process exit events for the leader.
   - re-add the leader back to the LWP list when we finally see it
     exit.

Change-Id: Id2d7bbb51a415534e1294fff1d555b7ecaa87f1d
---
 gdbserver/linux-low.cc | 120 ++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 38 deletions(-)

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index 4635d310839..5c037881670 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -135,6 +135,15 @@ typedef struct
 /* Does the current host support PTRACE_GETREGSET?  */
 int have_ptrace_getregset = -1;
 
+/* Return TRUE if THREAD is the leader thread of the process.  */
+
+static bool
+is_leader (thread_info *thread)
+{
+  ptid_t ptid = ptid_of (thread);
+  return ptid.pid () == ptid.lwp ();
+}
+
 /* LWP accessors.  */
 
 /* See nat/linux-nat.h.  */
@@ -1733,42 +1742,63 @@ linux_process_target::check_zombie_leaders ()
 
       if (leader_lp != NULL && !leader_lp->stopped
 	  /* Check if there are other threads in the group, as we may
-	     have raced with the inferior simply exiting.  */
+	     have raced with the inferior simply exiting.  Note this
+	     isn't a watertight check.  If the inferior is
+	     multi-threaded and is exiting, it may be we see the
+	     leader as zombie before we reap all the non-leader
+	     threads.  See comments below.  */
 	  && !last_thread_of_process_p (leader_pid)
 	  && linux_proc_pid_is_zombie (leader_pid))
 	{
-	  /* A leader zombie can mean one of two things:
-
-	     - It exited, and there's an exit status pending
-	     available, or only the leader exited (not the whole
-	     program).  In the latter case, we can't waitpid the
-	     leader's exit status until all other threads are gone.
-
-	     - There are 3 or more threads in the group, and a thread
-	     other than the leader exec'd.  On an exec, the Linux
-	     kernel destroys all other threads (except the execing
-	     one) in the thread group, and resets the execing thread's
-	     tid to the tgid.  No exit notification is sent for the
-	     execing thread -- from the ptracer's perspective, it
-	     appears as though the execing thread just vanishes.
-	     Until we reap all other threads except the leader and the
-	     execing thread, the leader will be zombie, and the
-	     execing thread will be in `D (disc sleep)'.  As soon as
-	     all other threads are reaped, the execing thread changes
-	     it's tid to the tgid, and the previous (zombie) leader
-	     vanishes, giving place to the "new" leader.  We could try
-	     distinguishing the exit and exec cases, by waiting once
-	     more, and seeing if something comes out, but it doesn't
-	     sound useful.  The previous leader _does_ go away, and
-	     we'll re-add the new one once we see the exec event
-	     (which is just the same as what would happen if the
-	     previous leader did exit voluntarily before some other
-	     thread execs).  */
-
+	  /* A zombie leader in a multi-threaded program can mean one
+	     of three things:
+
+	     #1 - Only the leader exited, not the whole program, e.g.,
+	     with pthread_exit.  Since we can't reap the leader's exit
+	     status until all other threads are gone and reaped too,
+	     we want to delete the zombie leader right away, as it
+	     can't be debugged, we can't read its registers, etc.
+	     This is the main reason we check for zombie leaders
+	     disappearing.
+
+	     #2 - The whole thread-group/process exited (a group exit,
+	     via e.g. exit(3), and there is (or will be shortly) an
+	     exit reported for each thread in the process, and then
+	     finally an exit for the leader once the non-leaders are
+	     reaped.
+
+	     #3 - There are 3 or more threads in the group, and a
+	     thread other than the leader exec'd.  See comments on
+	     exec events at the top of the file.
+
+	     Ideally we would never delete the leader for case #2.
+	     Instead, we want to collect the exit status of each
+	     non-leader thread, and then finally collect the exit
+	     status of the leader as normal and use its exit code as
+	     whole-process exit code.  Unfortunately, there's no
+	     race-free way to distinguish cases #1 and #2.  We can't
+	     assume the exit events for the non-leaders threads are
+	     already pending in the kernel, nor can we assume the
+	     non-leader threads are in zombie state already.  Between
+	     the leader becoming zombie and the non-leaders exiting
+	     and becoming zombie themselves, there's a small time
+	     window, so such a check would be racy.  Temporarily
+	     pausing all threads and checking to see if all threads
+	     exit or not before re-resuming them would work in the
+	     case that all threads are running right now, but it
+	     wouldn't work if some thread is currently already
+	     ptrace-stopped, e.g., due to scheduler-locking.
+
+	     So what we do is we delete the leader anyhow, and then
+	     later on when we see its exit status, we re-add it back.
+	     We also make sure that we only report a whole-process
+	     exit when we see the leader exiting, as opposed to when
+	     the last LWP in the LWP list exits, which can be a
+	     non-leader if we deleted the leader here.  */
 	  threads_debug_printf ("Thread group leader %d zombie "
-				"(it exited, or another thread execd).",
+				"(it exited, or another thread execd), "
+				"deleting it.",
 				leader_pid);
-
 	  delete_lwp (leader_lp);
 	}
     });
@@ -2185,7 +2215,22 @@ linux_process_target::filter_event (int lwpid, int wstat)
 	  /* Don't report an event for the exit of an LWP not in our
 	     list, i.e. not part of any inferior we're debugging.
 	     This can happen if we detach from a program we originally
-	     forked and then it exits.  */
+	     forked and then it exits.  However, note that we may have
+	     earlier deleted a leader of an inferior we're debugging,
+	     in check_zombie_leaders.  Re-add it back here if so.  */
+	  find_process ([&] (process_info *proc)
+	    {
+	      if (proc->pid == lwpid)
+		{
+		  threads_debug_printf
+		    ("Re-adding thread group leader LWP %d after exit.",
+		     lwpid);
+
+		  child = add_lwp (ptid_t (lwpid, lwpid));
+		  return true;
+		}
+	      return false;
+	    });
 	}
 
       if (child == nullptr)
@@ -2209,11 +2254,10 @@ linux_process_target::filter_event (int lwpid, int wstat)
 	  unsuspend_all_lwps (child);
 	}
 
-      /* If there is at least one more LWP, then the exit signal was
-	 not the end of the debugged application and should be
-	 ignored, unless GDB wants to hear about thread exits.  */
-      if (cs.report_thread_events
-	  || last_thread_of_process_p (pid_of (thread)))
+      /* If this is not the leader LWP, then the exit signal was not
+	 the end of the debugged application and should be ignored,
+	 unless GDB wants to hear about thread exits.  */
+      if (cs.report_thread_events || is_leader (thread))
 	{
 	  /* Since events are serialized to GDB core, and we can't
 	     report this one right now.  Leave the status pending for
@@ -2780,7 +2824,7 @@ linux_process_target::filter_exit_event (lwp_info *event_child,
   struct thread_info *thread = get_lwp_thread (event_child);
   ptid_t ptid = ptid_of (thread);
 
-  if (!last_thread_of_process_p (pid_of (thread)))
+  if (!is_leader (thread))
     {
       if (cs.report_thread_events)
 	ourstatus->set_thread_exited (0);
-- 
2.26.2


  parent reply	other threads:[~2022-03-03 14:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 14:40 [PATCH 00/11] Linux: Fix issues around thread group leader exits Pedro Alves
2022-03-03 14:40 ` [PATCH 01/11] Fix gdbserver/linux target_waitstatus logging assert Pedro Alves
2022-03-03 14:40 ` [PATCH 02/11] Fix gdb.threads/clone-new-thread-event.exp race Pedro Alves
2022-03-03 14:40 ` [PATCH 03/11] Fix gdb.threads/current-lwp-dead.exp race Pedro Alves
2022-03-03 14:40 ` [PATCH 04/11] gdb: Reorganize linux_nat_filter_event Pedro Alves
2022-03-03 14:40 ` [PATCH 05/11] gdbserver: Reorganize linux_process_target::filter_event Pedro Alves
2022-03-03 14:40 ` [PATCH 06/11] gdbserver: Reindent check_zombie_leaders Pedro Alves
2022-03-03 14:40 ` [PATCH 07/11] Re-add zombie leader on exit, gdb/linux Pedro Alves
2022-03-07 20:08   ` Simon Marchi
2022-03-07 20:27     ` Pedro Alves
2022-03-07 20:31       ` Simon Marchi
2022-03-09 14:37         ` Pedro Alves
2022-03-03 14:40 ` Pedro Alves [this message]
2022-03-03 14:40 ` [PATCH 09/11] Ensure EXIT is last event, gdb/linux Pedro Alves
2022-03-07 20:24   ` Simon Marchi
2022-03-09  0:21     ` Lancelot SIX
2022-03-09 14:45       ` Pedro Alves
2022-03-09 22:29         ` Lancelot SIX
2022-03-10 11:46           ` Pedro Alves
2022-03-03 14:40 ` [PATCH 10/11] Ensure EXIT is last event, gdbserver/linux Pedro Alves
2022-03-03 14:40 ` [PATCH 11/11] Process exit status is leader exit status testcase Pedro Alves
2023-06-22 11:28   ` Ilya Leoshkevich
2023-06-22 13:07     ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220303144020.3601082-9-pedro@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).