From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19549 invoked by alias); 15 Jan 2015 13:20:33 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 18617 invoked by uid 48); 15 Jan 2015 13:20:21 -0000 From: "dan at censornet dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/12683] Race conditions in pthread cancellation Date: Thu, 15 Jan 2015 13:20:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: dan at censornet dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: 2.19 X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg00123.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=12683 Dan Searle changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dan at censornet dot com --- Comment #25 from Dan Searle --- I think we have stubmled upon this bug, or something related to it. Can someone please confirm I'm on the right track here? We have a multithreaded server application which calls recv() and poll() from async cancellable threads, each thread handles a single connection with a master thread accpeting new connections and adding them to a job queue. More and more often now we are seeing the server lock up and on inspection two or more threads seem deadlocked in some race condition inside libc recv() and or poll(). One example here shows two back traces from gdb from the two threads that seemed deadlocked chewing 100% CPU: Thread 1 bt: #0 __pthread_disable_asynccancel () at ../nptl/sysdeps/unix/sysv/linux/x86_64/cancellation.S:98 #1 0x00007f895ba987fd in __libc_recv (fd=0, fd@entry=33, buf=buf@entry=0x7cada02b, n=n@entry=1024, flags=1537837035, flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:35 #2 0x000000000040ec54 in recv (__flags=16384, __n=1024, __buf=0x7cada02b, __fd=33) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44 [snip] Thread 2 bt: #0 0x00007f895ba987eb in __libc_recv (fd=fd@entry=31, buf=buf@entry=0x7ca5e02b, n=n@entry=1024, flags=-1, flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 #1 0x000000000040ec54 in recv (__flags=16384, __n=1024, __buf=0x7ca5e02b, __fd=31) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44 [snip] There can be more than two threads involved, but I'm unsure if it can happen with just one thread locked up, but it's always inside recv() or poll() and sometimes in __pthread_disable_asynccancel() within either of those. Could I work around this problem by changing the threads to syncronmous cancellable or try to work around the need to cancel the treads at all? -- You are receiving this mail because: You are on the CC list for the bug.