From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13188 invoked by alias); 10 Jan 2014 21:31:54 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 13141 invoked by uid 48); 10 Jan 2014 21:31:48 -0000 From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/12683] Race conditions in pthread cancellation Date: Fri, 10 Jan 2014 21:31:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: 2.19 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-01/txt/msg00172.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=12683 --- Comment #11 from Carlos O'Donell --- Alex Oliva and I talked about this particular issue today. We believe that an entirely userspace solution is possible without assistance from the kernel, but it requires signal wrappers. Signal wrappers are code that execute before and after a signal handler and does things like save and restore errno (the one use we have for them currently). The signal wrappers would assist in handling deferred cancellation. The proposed solution would look like this: * Stop enabling/disabling asynchronous cancellation around syscalls. * When a blocking library function who is also a cancellation point is entered a word in the thread's TCB (call it IN_SYSCALL) is set to the value of the stack pointer (we assume no further stack adjustments are made before the function exits). The value of IN_SYSCALL is cleared just before the function returns. Deferred cancellation is still checked before and after the syscall. * Add a signal wrapper to all signals that checks to see if IN_SYSCALL == SP stored in the ucontext_t and if it does it immediately cancels the thread. The check is done upon entry and exit of the wrapper to reduce cancellation latency. Just before unwinding the IN_SYCALL value is cleared. * When a thread starts we install a SIGCANCEL (SIGRTMIN) handler like we did before, but this handler checks to see if the thread's IN_SYSCALL matches the SP stored in ucontext_t, indicating that cancellation was requested while executing in the cancellation region of a blocking syscall (and no other signal handler executing). In that case the signal handler cancels the thread immediately. If IN_SYSCALL != SP then another signal handler is running and we defer the cancellation to the signal wrapper or syscall wrapper. The SIGCANCEL handler operates as it previously did when asynchronous cancellation was enabled. Resolved use cases: - Cancellation delivered between first instruction of function and IN_SYSCALL set: Syscall wrapper code will check for cancellation and act upon it. - Cancellation delivered between IN_SYSCALL set and syscall: The SIGCANCEL handler will immediately cancel the thread. - Cancellation delivered between syscall and clearing IN_SYSCALL: The SIGCANCEL handler will immediately cancel the thread. - Cancellation delivered between clearing of IN_SYSCALL and function return: The next cancellation point will act upon the cancellation (still meets POSIX requirement given escape clause of "The thread is suspended at a cancellation point and the event for which it is waiting occurs"). - Cancellation delivered and thread stopped at syscall is executing multiple nested signal handlers and the first signal handler has not checked IN_SYSCALL yet: Only the first signal delivered will have IN_SYSCALL == SP be true. The SIGCANCEL handler will do nothing. The first signal handler's wrapper will detect the cancellation is active and act upon it as it exits (only after all the other signal handlers have completed). - Cancellation delivered and thread stopped at syscall is executing multiple nested signal handlers and the first signal handler is exiting and has already checked IN_SYSCALL: The syscall will be interrupted and return. The syscall wrapper will act upon the cancellation request. The goal here is to have the signal handlers finish executing without interruption. Unresolved use cases: - Related to bug 14147 -- Cancellation delivered while thread is blocked on an async-safe function (in fact it's only executing async-safe functions during the time a signal can be delivered for this to be valid) and executing a signal handler that longjmp's out of the function. In this case IN_SYSCALL is still set to SP and not cleared. If by luck SP ends up the same, and another thread delivers a cancellation request the SIGCANCEL handler will immediately cancel the thread even though it was not in a cancellation region. - What if you are executing fork and someone tries to cancel you? A potential resolution to the first unresolved use case is to use a cleanup handler to reset IN_SYSCALL since such a handler is run when longjmp unwinds the frames. However we then need to consider cancellation during the execution of the cleanup. I haven't fully thought through what to do with the forking a multithreaded program case, but we should try to see if we can make it work. Note: Setting IN_SYSCALL must be atomic. -- You are receiving this mail because: You are on the CC list for the bug.