From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22111 invoked by alias); 17 Sep 2014 22:13:39 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 22098 invoked by uid 89); 17 Sep 2014 22:13:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp05.br.ibm.com Message-ID: <541A0787.2020804@linux.vnet.ibm.com> Date: Wed, 17 Sep 2014 22:13:00 -0000 From: Adhemerval Zanella User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: libc-alpha@sourceware.org Subject: Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683) References: <5410C70E.70207@linux.vnet.ibm.com> <1410533066.4967.96.camel@triegel.csb> In-Reply-To: <1410533066.4967.96.camel@triegel.csb> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14091722-1798-0000-0000-0000007FC09C X-SW-Source: 2014-09/txt/msg00444.txt.bz2 On 12-09-2014 11:44, Torvald Riegel wrote: > On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote: >> I have summarized in [1] the current issues with GLIBC pthread cancellation system, >> the current GLIBC implementation and the proposed solution by Rich Felker with the >> adjustment required to enabled it on GLIBC. >> >> It is still heavily WIP and I'm still planning to add more content, so any question, >> comments, advices are welcomed. >> >> The GLIBC adjustment to proposed solution is in fact the current work I'm doing to >> rewrite pthread cancellation subsystem [2]. My code still needs a *lot* of cleanup, >> but initial results are promising. It is building on both powerpc64 and x86_64 >> (it won't build on others platforms basically because I rewrite the way cancelable >> syscalls are done). > The general direction seems good to me. Thanks for working on this. I > have some questions/comments about the steps you outline in the "GLIBC > adjustment" section on the wiki page: > > "4. Adjust nptl/pthread_cancel.c to send an signal instead of acting > directly. This avoid synchronization issues about updating the > cancellation status and also focus the logic on signal handler and > cancellation syscall code." > > The current code seems focused on trying to avoid sending signal if > possible, seemingly because of performance concerns. My gut feeling is > that cancellation doesn't need to have low latency, but do we have > sufficient confidence that always sending the signal is alright? > > I believe we can still avoid sending the signal with the new scheme; > what we'd have to do is let the code for cancellable syscalls mark > (using atomic ops) whether it's currently executing or not (ie, > fetch_and_set to true and fetch_and_set to previous value); then the > pthread_cancel should see whether it needs to send a signal or not. > That adds a bit more synchronization, but seems possible. This solution seems feasible, but I would like to address it, if possible, on a next patch iterations. Right now, I would like to focus on implementation correctness for such solution. I will add on the wiki the suggestion. > > > "8. Adjust 'lowlevellock.h' arch-specific implementations to provide > cancelable futex calls (used in libpthread code)." > > Only FUTEX_WAIT should be cancellable, I suppose. Yes, I had to add cancel wrappers just for lll_futex_* that do FUTEX_WAIT. > > I've been thinking about whether it would be easier for the > implementation of cancellable pthreads functions to just call a > noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the > wait but calling pthread_testcancel instead so that any pending > cancellation is acted upon). However, this seems to save less > complexity than just doing for FUTEX_WAIT what we do for all the other > cancellable syscalls, so it's probably not worth it. > > > I'd add another thing to the list of steps, though: For future-proofing, > it would be good pthread_setcancelstate and pthread_setcanceltype would > have an explicit compiler barrier in them. If at some point we have LTO > including those functions, we use C11 atomics, and compilers start > optimizing those more aggressively, then I believe we're at risk that > the compiler reorders code across the atomics in pthread_setcancelstate; > this could lead to cancellation handlers seeing unexpected state when > they are actually run. C11 has tighter rules for what's allowed in > signal handlers (IIRC, these are still under discussion, at least in the > C++ committee) than what POSIX requires, yet the cancellation handler is > similar to a signal handler. A compiler barrier should hopefully > prevent any issues due to that mismatch. Thoughts? > I will add this on the future iterations for this patch, but since it is not really and issue right now, I prefer to work after this fix is corrected.