public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* GDB hangs with simple multi-threaded program on linux
@ 2010-07-15 15:46 Thiago Jung Bauermann
  2010-07-15 18:44 ` Tom Tromey
  0 siblings, 1 reply; 5+ messages in thread
From: Thiago Jung Bauermann @ 2010-07-15 15:46 UTC (permalink / raw)
  To: gdb

Hi,

I'm struggling with an issue which perhaps you already faced or thought
about...

The following testcase locks GDB nearly every time on Linux:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS     2

pthread_t main_thread;

void *print_hello (void *threadid)
{
   int tid = (int) threadid;

   printf ("Hello world! It's me, thread #%d!\n", tid);

   /* The first thread will wait main terminate.  */
   if (tid == 0)
     pthread_join (main_thread, NULL);

   pthread_exit (NULL);
}

int main (int argc, char *argv[])
{
   int i, rc;
   pthread_t threads[NUM_THREADS];

   main_thread = pthread_self ();

   for (i = 0; i < NUM_THREADS; i++) {
      printf ("In main: creating thread %d\n", i);

      rc = pthread_create (&threads[i], NULL, print_hello, (void *) i);
      if (rc) {
         printf ("ERROR; return code from pthread_create is %d\n", rc);
         exit (-1);
      }
   }

   pthread_exit (NULL);
}

What's special about this testcase is that the main thread exits earlier
than the threads it creates.

What GDB does is that when it is notified about a signal in some thread,
it will send a SIGSTOP to the other threads in the process and then call
waitpid on them to make sure that the threads indeed stopped (at the end
of linux_nat_wait_1, when it call stop_callback and stop_wait_callback
on all LWPs).

Normally this is ok, but what is happening here is that when GDB is
notified about a signal in some thread, the main thread already exited
(but GDB is oblivious to this fact), and GDB sends a SIGSTOP to every
thread in the debuggee (including the zombie main thread) and then when
it goes on to wait on them threads, it hangs while waiting on the main
thread.

I suspect that waitpid interprets the call to wait on the main thread to
actually mean waiting on the whole program instead (since TID == PID in
this case) and hangs because there are other threads in the thread group
(even though they are in the tracing stop state).

So my questions are:

1. Is it true that when the main thread exits but there are other
threads in the thread group, then no SIGCHLD is generated to notify GDB
that it exited (perhaps because such a SIGCHLD could be ambiguous and
mean that the whole process exited)? If so, how can GDB learn when the
main thread exits? This is why GDB still thinks the main thread is still
around. Either that, or GDB missed the SIGCHLD or it is later in the
queue and yet unprocessed.

2. Is there a way for GDB to wait on just the main thread instead of on
the whole process when it waits on a TID which is also the PID?

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GDB hangs with simple multi-threaded program on linux
  2010-07-15 15:46 GDB hangs with simple multi-threaded program on linux Thiago Jung Bauermann
@ 2010-07-15 18:44 ` Tom Tromey
  2010-07-15 18:47   ` Tom Tromey
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tom Tromey @ 2010-07-15 18:44 UTC (permalink / raw)
  To: Thiago Jung Bauermann; +Cc: gdb

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

>>>>> "Thiago" == Thiago Jung Bauermann <bauerman@br.ibm.com> writes:

Thiago> I'm struggling with an issue which perhaps you already faced or
Thiago> thought about...

I asked around about this, and it turns out that we have a patch in the
Fedora SRPM for it.

The approach in this patch seems to be racy.  Roland says we can do
better if we enable exit tracing.  I see this in linux-nat.c:

  /* Do not enable PTRACE_O_TRACEEXIT until GDB is more prepared to support
     read-only process state.  */

I wonder what that means :-)

Thiago> 1. Is it true that when the main thread exits but there are other
Thiago> threads in the thread group, then no SIGCHLD is generated to notify GDB
Thiago> that it exited (perhaps because such a SIGCHLD could be ambiguous and
Thiago> mean that the whole process exited)?

Yes, Roland said that no SIGCHLD is generated.

Thiago> 2. Is there a way for GDB to wait on just the main thread instead of on
Thiago> the whole process when it waits on a TID which is also the PID?

I guess not.

Tom


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: leader-exit-fix.patch --]
[-- Type: text/x-patch, Size: 4755 bytes --]

2007-07-08  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* linux-nat.c (linux_lwp_is_zombie): New function.
	(wait_lwp): Fix lockup on exit of the thread group leader.
	(linux_xfer_partial): Renamed to ...
	(linux_xfer_partial_lwp): ... here.
	(linux_xfer_partial): New function wrapping LINUX_XFER_PARTIAL_LWP.

2008-02-24  Jan Kratochvil  <jan.kratochvil@redhat.com>

	Port to GDB-6.8pre.

Index: gdb-6.8.50.20081209/gdb/linux-nat.c
===================================================================
--- gdb-6.8.50.20081209.orig/gdb/linux-nat.c	2008-12-10 01:27:34.000000000 +0100
+++ gdb-6.8.50.20081209/gdb/linux-nat.c	2008-12-10 01:28:14.000000000 +0100
@@ -1981,6 +1981,31 @@ linux_handle_extended_wait (struct lwp_i
 		  _("unknown ptrace event %d"), event);
 }
 
+static int
+linux_lwp_is_zombie (long lwp)
+{
+  char buffer[MAXPATHLEN];
+  FILE *procfile;
+  int retval = 0;
+
+  sprintf (buffer, "/proc/%ld/status", lwp);
+  procfile = fopen (buffer, "r");
+  if (procfile == NULL)
+    {
+      warning (_("unable to open /proc file '%s'"), buffer);
+      return 0;
+    }
+  while (fgets (buffer, sizeof (buffer), procfile) != NULL)
+    if (strcmp (buffer, "State:\tZ (zombie)\n") == 0)
+      {
+	retval = 1;
+	break;
+      }
+  fclose (procfile);
+
+  return retval;
+}
+
 /* Wait for LP to stop.  Returns the wait status, or 0 if the LWP has
    exited.  */
 
@@ -1988,16 +2013,31 @@ static int
 wait_lwp (struct lwp_info *lp)
 {
   pid_t pid;
-  int status;
+  int status = 0;
   int thread_dead = 0;
 
   gdb_assert (!lp->stopped);
   gdb_assert (lp->status == 0);
 
-  pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
-  if (pid == -1 && errno == ECHILD)
+  /* Thread group leader may have exited but we would lock up by WAITPID as it
+     waits on all its threads; __WCLONE is not applicable for the leader.
+     The thread leader restrictions is only a performance optimization here.
+     LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED
+     process; it gets ESRCH both for the zombie and for running processes.  */
+  if (is_lwp (lp->ptid) && GET_PID (lp->ptid) == GET_LWP (lp->ptid)
+      && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
+    {
+      thread_dead = 1;
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog, "WL: Threads leader %s vanished.\n",
+			    target_pid_to_str (lp->ptid));
+    }
+
+  if (!thread_dead)
     {
-      pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
+      pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
+      if (pid == -1 && errno == ECHILD)
+	pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
       if (pid == -1 && errno == ECHILD)
 	{
 	  /* The thread has previously exited.  We need to delete it
@@ -4153,8 +4193,10 @@ linux_nat_xfer_osdata (struct target_ops
   return len;
 }
 
+/* Transfer from the specific LWP currently set by PID of INFERIOR_PTID.  */
+
 static LONGEST
-linux_xfer_partial (struct target_ops *ops, enum target_object object,
+linux_xfer_partial_lwp (struct target_ops *ops, enum target_object object,
                     const char *annex, gdb_byte *readbuf,
 		    const gdb_byte *writebuf, ULONGEST offset, LONGEST len)
 {
@@ -4201,6 +4243,45 @@ linux_xfer_partial (struct target_ops *o
 			     offset, len);
 }
 
+/* nptl_db expects being able to transfer memory just by specifying PID.
+   After the thread group leader exists the Linux kernel turns the task
+   into zombie no longer permitting accesses to its memory.
+   Transfer the memory from an arbitrary LWP_LIST entry in such case.  */
+
+static LONGEST
+linux_xfer_partial (struct target_ops *ops, enum target_object object,
+                    const char *annex, gdb_byte *readbuf,
+		    const gdb_byte *writebuf, ULONGEST offset, LONGEST len)
+{
+  LONGEST xfer;
+  struct lwp_info *lp;
+  /* Not using SAVE_INFERIOR_PTID already here for better performance.  */
+  struct cleanup *old_chain = NULL;
+  ptid_t inferior_ptid_orig = inferior_ptid;
+
+  errno = 0;
+  xfer = linux_xfer_partial_lwp (ops, object, annex, readbuf, writebuf,
+				 offset, len);
+
+  for (lp = lwp_list; xfer == 0 && (errno == EACCES || errno == ESRCH)
+		      && lp != NULL; lp = lp->next)
+    {
+      if (!is_lwp (lp->ptid) || ptid_equal (lp->ptid, inferior_ptid_orig))
+        continue;
+      
+      if (old_chain == NULL)
+	old_chain = save_inferior_ptid ();
+      inferior_ptid = BUILD_LWP (GET_LWP (lp->ptid), GET_LWP (lp->ptid));
+      errno = 0;
+      xfer = linux_xfer_partial_lwp (ops, object, annex, readbuf, writebuf,
+				     offset, len);
+    }
+
+  if (old_chain != NULL)
+    do_cleanups (old_chain);
+  return xfer;
+}
+
 /* Create a prototype generic GNU/Linux target.  The client can override
    it with local methods.  */
 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: leader-exit-test.patch --]
[-- Type: text/x-patch, Size: 3546 bytes --]

2007-07-07  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* gdb.threads/leader-exit.c, gdb.threads/leader-exit.exp: New files.

--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ ./gdb/testsuite/gdb.threads/leader-exit.c	7 Jul 2007 15:21:57 -0000
@@ -0,0 +1,47 @@
+/* Clean exit of the thread group leader should not break GDB.
+
+   Copyright 2007 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+#include <pthread.h>
+#include <assert.h>
+#include <unistd.h>
+
+static void *start (void *arg)
+{
+  for (;;)
+    pause ();
+  /* NOTREACHED */
+  assert (0);
+  return arg;
+}
+
+int main (void)
+{
+  pthread_t thread;
+  int i;
+
+  i = pthread_create (&thread, NULL, start, NULL);	/* create1 */
+  assert (i == 0);
+
+  pthread_exit (NULL);
+  /* NOTREACHED */
+  assert (0);
+  return 0;
+}
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ ./gdb/testsuite/gdb.threads/leader-exit.exp	7 Jul 2007 15:21:57 -0000
@@ -0,0 +1,64 @@
+# Copyright (C) 2007 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  
+
+# Exit of the thread group leader should not break GDB.
+
+# This file was written by Jan Kratochvil <jan.kratochvil@redhat.com>.
+
+if $tracelevel then {
+	strace $tracelevel
+}
+
+set testfile "leader-exit"
+set srcfile ${testfile}.c
+set binfile ${objdir}/${subdir}/${testfile}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+    return -1
+}
+
+gdb_exit
+gdb_start
+gdb_reinitialize_dir $srcdir/$subdir
+gdb_load ${binfile}
+gdb_run_cmd
+
+proc stop_process { description } {
+  global gdb_prompt
+
+  # For this to work we must be sure to consume the "Continuing."
+  # message first, or GDB's signal handler may not be in place.
+  after 1000 {send_gdb "\003"}
+  gdb_expect {
+    -re "Program received signal SIGINT.*$gdb_prompt $"
+      {
+	pass $description
+      }
+    timeout
+      {
+	fail "$description (timeout)"
+      }
+  }
+}
+
+# Prevent races.
+sleep 8
+
+stop_process "Threads could be stopped"
+
+gdb_test "info threads" \
+         "\\* 2 Thread \[^\r\n\]* in \[^\r\n\]*" \
+         "Single thread has been left"

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GDB hangs with simple multi-threaded program on linux
  2010-07-15 18:44 ` Tom Tromey
@ 2010-07-15 18:47   ` Tom Tromey
  2010-07-15 22:22   ` Thiago Jung Bauermann
  2010-07-16 15:58   ` Daniel Jacobowitz
  2 siblings, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2010-07-15 18:47 UTC (permalink / raw)
  To: Thiago Jung Bauermann; +Cc: gdb

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Tom> Roland says we can do better if we enable exit tracing.  I see this
Tom> in linux-nat.c:
Tom>   /* Do not enable PTRACE_O_TRACEEXIT until GDB is more prepared to support
Tom>      read-only process state.  */
Tom> I wonder what that means :-)

BTW, if you plan to work on this, it is also PR 10970.

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GDB hangs with simple multi-threaded program on linux
  2010-07-15 18:44 ` Tom Tromey
  2010-07-15 18:47   ` Tom Tromey
@ 2010-07-15 22:22   ` Thiago Jung Bauermann
  2010-07-16 15:58   ` Daniel Jacobowitz
  2 siblings, 0 replies; 5+ messages in thread
From: Thiago Jung Bauermann @ 2010-07-15 22:22 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb

On Thu, 2010-07-15 at 12:44 -0600, Tom Tromey wrote:
> >>>>> "Thiago" == Thiago Jung Bauermann <bauerman@br.ibm.com> writes:
> 
> Thiago> I'm struggling with an issue which perhaps you already faced or
> Thiago> thought about...
> 
> I asked around about this, and it turns out that we have a patch in the
> Fedora SRPM for it.

Thanks for looking into it!

> The approach in this patch seems to be racy.  Roland says we can do
> better if we enable exit tracing.  I see this in linux-nat.c:
> 
>   /* Do not enable PTRACE_O_TRACEEXIT until GDB is more prepared to support
>      read-only process state.  */
> 
> I wonder what that means :-)

I will play with that option and see what happens...

Thanks for sending the patch. It is racy but at least makes an effort to
avoid the trap, which is an improvement. :-)

> Thiago> 1. Is it true that when the main thread exits but there are other
> Thiago> threads in the thread group, then no SIGCHLD is generated to notify GDB
> Thiago> that it exited (perhaps because such a SIGCHLD could be ambiguous and
> Thiago> mean that the whole process exited)?
> 
> Yes, Roland said that no SIGCHLD is generated.

Bummer.

> Thiago> 2. Is there a way for GDB to wait on just the main thread instead of on
> Thiago> the whole process when it waits on a TID which is also the PID?
> 
> I guess not.

Ouch. Looks like we have our hands tied here. :-/

> BTW, if you plan to work on this, it is also PR 10970.

I do, and will update the PR accordingly. Thanks!
-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GDB hangs with simple multi-threaded program on linux
  2010-07-15 18:44 ` Tom Tromey
  2010-07-15 18:47   ` Tom Tromey
  2010-07-15 22:22   ` Thiago Jung Bauermann
@ 2010-07-16 15:58   ` Daniel Jacobowitz
  2 siblings, 0 replies; 5+ messages in thread
From: Daniel Jacobowitz @ 2010-07-16 15:58 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Thiago Jung Bauermann, gdb

On Thu, Jul 15, 2010 at 12:44:32PM -0600, Tom Tromey wrote:
> The approach in this patch seems to be racy.  Roland says we can do
> better if we enable exit tracing.  I see this in linux-nat.c:
> 
>   /* Do not enable PTRACE_O_TRACEEXIT until GDB is more prepared to support
>      read-only process state.  */
> 
> I wonder what that means :-)

I meant to use this for "catch exit".  But once you reach
PTRACE_O_TRACEEXIT, the process is in a pretty unique state.  For
instance, you can't call a function - if you do, the process will exit
as soon as you resume it!

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-07-16 15:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-15 15:46 GDB hangs with simple multi-threaded program on linux Thiago Jung Bauermann
2010-07-15 18:44 ` Tom Tromey
2010-07-15 18:47   ` Tom Tromey
2010-07-15 22:22   ` Thiago Jung Bauermann
2010-07-16 15:58   ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).