From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6626 invoked by alias); 15 Jul 2010 15:46:18 -0000 Received: (qmail 6613 invoked by uid 22791); 15 Jul 2010 15:46:17 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e24smtp04.br.ibm.com (HELO e24smtp04.br.ibm.com) (32.104.18.25) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Jul 2010 15:45:33 +0000 Received: from d24relay01.br.ibm.com (d24relay01.br.ibm.com [9.8.31.16]) by e24smtp04.br.ibm.com (8.14.4/8.13.1) with ESMTP id o6FFhQPV028720 for ; Thu, 15 Jul 2010 12:43:26 -0300 Received: from d24av05.br.ibm.com (d24av05.br.ibm.com [9.18.232.44]) by d24relay01.br.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o6FFjUSV4051176 for ; Thu, 15 Jul 2010 12:45:30 -0300 Received: from d24av05.br.ibm.com (loopback [127.0.0.1]) by d24av05.br.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o6FFjU80030588 for ; Thu, 15 Jul 2010 12:45:30 -0300 Received: from [9.18.203.163] ([9.18.203.163]) by d24av05.br.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id o6FFjTkU030580 for ; Thu, 15 Jul 2010 12:45:30 -0300 Subject: GDB hangs with simple multi-threaded program on linux From: Thiago Jung Bauermann To: gdb@sourceware.org Content-Type: text/plain; charset="UTF-8" Date: Thu, 15 Jul 2010 15:46:00 -0000 Message-ID: <1279208729.14577.21.camel@hactar> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2010-07/txt/msg00045.txt.bz2 Hi, I'm struggling with an issue which perhaps you already faced or thought about... The following testcase locks GDB nearly every time on Linux: #include #include #include #define NUM_THREADS 2 pthread_t main_thread; void *print_hello (void *threadid) { int tid = (int) threadid; printf ("Hello world! It's me, thread #%d!\n", tid); /* The first thread will wait main terminate. */ if (tid == 0) pthread_join (main_thread, NULL); pthread_exit (NULL); } int main (int argc, char *argv[]) { int i, rc; pthread_t threads[NUM_THREADS]; main_thread = pthread_self (); for (i = 0; i < NUM_THREADS; i++) { printf ("In main: creating thread %d\n", i); rc = pthread_create (&threads[i], NULL, print_hello, (void *) i); if (rc) { printf ("ERROR; return code from pthread_create is %d\n", rc); exit (-1); } } pthread_exit (NULL); } What's special about this testcase is that the main thread exits earlier than the threads it creates. What GDB does is that when it is notified about a signal in some thread, it will send a SIGSTOP to the other threads in the process and then call waitpid on them to make sure that the threads indeed stopped (at the end of linux_nat_wait_1, when it call stop_callback and stop_wait_callback on all LWPs). Normally this is ok, but what is happening here is that when GDB is notified about a signal in some thread, the main thread already exited (but GDB is oblivious to this fact), and GDB sends a SIGSTOP to every thread in the debuggee (including the zombie main thread) and then when it goes on to wait on them threads, it hangs while waiting on the main thread. I suspect that waitpid interprets the call to wait on the main thread to actually mean waiting on the whole program instead (since TID == PID in this case) and hangs because there are other threads in the thread group (even though they are in the tracing stop state). So my questions are: 1. Is it true that when the main thread exits but there are other threads in the thread group, then no SIGCHLD is generated to notify GDB that it exited (perhaps because such a SIGCHLD could be ambiguous and mean that the whole process exited)? If so, how can GDB learn when the main thread exits? This is why GDB still thinks the main thread is still around. Either that, or GDB missed the SIGCHLD or it is later in the queue and yet unprocessed. 2. Is there a way for GDB to wait on just the main thread instead of on the whole process when it waits on a TID which is also the PID? -- []'s Thiago Jung Bauermann IBM Linux Technology Center