From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20198 invoked by alias); 26 Apr 2011 00:39:47 -0000 Received: (qmail 20175 invoked by uid 22791); 26 Apr 2011 00:39:47 -0000 X-SWARE-Spam-Status: No, hits=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 26 Apr 2011 00:39:33 +0000 From: "dje at google dot com" To: gdb-prs@sourceware.org Subject: [Bug gdb/12702] New: gdb can hang waiting for thread group leader X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: gdb X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dje at google dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Tue, 26 Apr 2011 00:39:00 -0000 Mailing-List: contact gdb-prs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-prs-owner@sourceware.org X-SW-Source: 2011-q2/txt/msg00216.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=12702 Summary: gdb can hang waiting for thread group leader Product: gdb Version: HEAD Status: NEW Severity: normal Priority: P2 Component: gdb AssignedTo: unassigned@sourceware.org ReportedBy: dje@google.com Created attachment 5685 --> http://sourceware.org/bugzilla/attachment.cgi?id=5685 Patch to workaround the issue. When ptracing, waitpid with options == 0 will hang if the thread group leader exits while there are still other threads around. [Blech! IWBN if it could at least return an error instead of hanging.] When gdb detects a thread has stopped, it will stop other threads before returning control to the user (in "all-stop mode"). A race occurs when the main thread exits between gdb detecting a 2nd thread has stopped and gdb waits for the main thread to stop. If the main thread has exited and there are other threads then waitpid (main_thread_pid, &status, 0) will hang. This patch to gdb and the accompanying testcase illustrates the issue. Apply to cvs head as of 11apr25. diff -u -p -r1.199 linux-nat.c --- linux-nat.c 9 Mar 2011 12:48:55 -0000 1.199 +++ linux-nat.c 25 Apr 2011 23:48:13 -0000 @@ -4047,6 +4108,21 @@ linux_thread_alive (ptid_t ptid) target_pid_to_str (ptid), err ? safe_strerror (tmp_errno) : "OK"); + if (debug_linux_nat && GET_PID (ptid) != GET_LWP (ptid)) + { + char buf[200]; + sprintf (buf, "cat /proc/%ld/task/%ld/status", + (long) GET_PID (ptid), (long) GET_LWP (ptid)); + system (buf); + sleep (3); + err = kill_lwp (GET_LWP (ptid), 0); + fprintf_unfiltered (gdb_stdlog, + "LLTA: KILL(SIG0) %s (%s)\n", + target_pid_to_str (ptid), + err ? safe_strerror (err) : "OK"); + system (buf); + } + if (err != 0) return 0; bash$ cat testcase.c #include #include #include #include #include int gettid () { return syscall (__NR_gettid); } void printf_flushed (const char *msg, ...) { va_list args; va_start (args, msg); vprintf (msg, args); va_end (args); fflush (stdout); } void* thread_function (void* dummy_ptr) { printf_flushed ("Thread self 0x%x, pid %d, lwp %d\n", pthread_self (), getpid (), gettid ()); asm volatile ("int3"); pthread_exit ((void *) 0); abort (); } int main () { pthread_t thread_id; pthread_create (&thread_id, NULL, thread_function, NULL); sleep (1); return 0; } bash$ gcc -g testcase.c -o testcase.x64 -lpthread bash$ ./gdb --batch -nx -ex "set debug infrun 1" -ex "set debug lin-lwp 1" -ex run -ex quit ./testcase.x64 [...] LLW: waitpid 15050 received Trace/breakpoint trap (stopped) LLTA: KILL(SIG0) Thread 0x7ffff783c700 (LWP 15050) (OK) Name: testcase.x64 State: T (tracing stop) Tgid: 15047 Pid: 15050 PPid: 15045 TracerPid: 15045 [...] State: Z (zombie) Tgid: 15047 Pid: 15050 PPid: 15045 TracerPid: 15045 [...] LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff783c700 (LWP 15050). SC: kill Thread 0x7ffff7fd7700 (LWP 15047) **** SC: lwp kill 0 ERRNO-OK hang The attached patch works around the bug^wissue. I suspect all calls to waitpid with options == 0 need to be audited and either fixed or documented that they can't trip over this issue. -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.