From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 73206 invoked by alias); 19 Dec 2016 18:20:09 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 73188 invoked by uid 89); 19 Dec 2016 18:20:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=blocked, 1959, sk:__glibc, riegel X-HELO: mail-io0-f171.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=//9NzvzOdx/7Je55LSp8l/j+4NzNmpPAIussp7xe738=; b=N8rjHTNsrxy0noxsf+K4wCxdJSfnIjm744HutzZDzXU8IOw6dyxADinPhNqY/QnGBA kHZJJlCcGUcxD/j+luFnHtIOgn9ICkL1v9m590oElZZNReP4v9P9AAFNQrKkbDyWvz9i M4FJA9qF3MnPAfH95o6fURJsYRhMYuauMDvO47nfLMz4ZFqDIIrbNTabUr4r2DJBtjgC JzEaSlycXUGbtBvwyqE5+BZS5EhEhk5v97qbNL3MaBPwUzV63WwG22Ijx97kNDuVS4ho ANsdKb1NhIp6hPmq5rC3IKolirvHplmuO8HKysW3BzVHwMC5d9SvfuW+KmmjKduQJ9Du 2dLg== X-Gm-Message-State: AIkVDXI3PcFckG/dB8tcWQ2Qm6c6FVZhNt51TfK8S7xthwqbGIGDMd4RMu6lV4QgGW2NKyiy X-Received: by 10.107.26.80 with SMTP id a77mr15948209ioa.109.1482171604755; Mon, 19 Dec 2016 10:20:04 -0800 (PST) Subject: Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up. To: Torvald Riegel , GLIBC Devel References: <1481840825.14990.586.camel@redhat.com> <1481840946.14990.588.camel@redhat.com> Cc: vl@samba.org, Michael Adam , Rich Felker From: Carlos O'Donell Message-ID: Date: Mon, 19 Dec 2016 18:20:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <1481840946.14990.588.camel@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2016-12/txt/msg00729.txt.bz2 On 12/15/2016 05:29 PM, Torvald Riegel wrote: > On Thu, 2016-12-15 at 23:27 +0100, Torvald Riegel wrote: >> See patch for a description. >> >> Tested on x86_64-linux with our tests and the test case from the >> original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1401665 >> >> OK? This looks good to me. Detailed review below. > Now with an attached patch... > > > robust-mutex-lwu.patch > > > commit 74be3b28d962a5a6d2eeff51b93d61ddf91e2d49 > Author: Torvald Riegel > Date: Thu Dec 15 16:06:28 2016 +0100 > > Robust mutexes: Fix lost wake-up. > > Assume that Thread 1 waits to acquire a robust mutex using futexes to > block (and thus sets the FUTEX_WAITERS flag), and is unblocked when this > mutex is released. If Thread 2 concurrently acquires the lock and is > killed, Thread 1 can recover from the died owner but fail to restore the > FUTEX_WAITERS flag. This can lead to a Thread 3 that also blocked using > futexes at the same time as Thread 1 to not get woken up because > FUTEX_WAITERS is not set anymore. Just to reiterate: T0 T1 T2 T3 | | | | LOCK | | | | WAIT | | | s | WAIT UNLOCK s | s WAKE AWAKE | s | | LOCK s | | DIES s | LOCK s | UNLOCK s | NOWAKE * At the point T0 unlocks and wakes T1 we have both T1 and T2 attempting to acquire the same lock which has a value of 0 (no waiters since the shared waiter flag was cleared). If T2 take the lock (no FUTEX_WAITERS set) and dies, the futex value is going to be reset to FUTEX_OWNER_DIED by the kernel (note that the kernel preserves FUTEX_WAITERS, but it was not set in this case because T0 unlocked). In the case of T1, the following loop is executing 27 int 28 __lll_robust_lock_wait (int *futex, int private) 29 { 37 do 38 { 39 /* If the owner died, return the present value of the futex. */ 40 if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED)) 41 return oldval; 42 43 /* Try to put the lock into state 'acquired, possibly with waiters'. */ 44 int newval = oldval | FUTEX_WAITERS; 45 if (oldval != newval 46 && atomic_compare_and_exchange_bool_acq (futex, newval, oldval)) 47 continue; 48 49 /* If *futex == 2, wait until woken. */ 50 lll_futex_wait (futex, newval, private); We have just been woken. T2 acquires the lock, dies, and the lock is cleaned up. Nobody is waiting on the lock. 52 try: 53 ; 54 } 55 while ((oldval = atomic_compare_and_exchange_val_acq (futex, 56 tid | FUTEX_WAITERS, 57 0)) != 0); This fails because the value is not 0 it's FUTEX_OWNER_DIED. We loop up to the top and check FUTEX_OWNER_DIED, which matches, and then we return to the higher-level code which has now lost FUTEX_WAITERS. At this point T3 has lost the wakeup it needs to continue with the lock because T1 has "forgotten" there were waiters. > The fix for this is to ensure that we continue to preserve the > FUTEX_WAITERS flag whenever we may have set it or shared it with another > thread. This is the same requirement as in the algorithm for normal > mutexes, only that the robust mutexes need additional handling for died > owners and thus preserving the FUTEX_WAITERS flag cannot be done just in > the futex slowpath code. Agreed. Just highlighting from above: 39 /* If the owner died, return the present value of the futex. */ 40 if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED)) 41 return oldval; Here we return the shared oldval with FUTEX_WAITERS lost. I struggled with the possibility of a simpler fix that just changed the returned oldval, but you correctly pointed out to me that such a fix breaks the abstraction and algorithms that consider oldval to be the previously read value from memory. Therefore I agree that this needs to be handled at a higher level which also has to handle re-inserting FUTEX_WAITERS as required as is done here. With your patch the only two lock patch we have are now covered and correctly re-add FUTEX_WAITERS if we'd had a slowpath wait that might potentially have lost the FUTEX_WAITERS flag due to a concurrent lock/death sequence. I don't immediately see any other broken paths where FUTEX_WAITERS can be dropped and the wakeup lost. > [BZ #20973] > * nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full): Fix lost > wake-up in robust mutexes. > * nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock): Likewise. > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index bdfa529..01ac75e 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -182,6 +182,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) > &mutex->__data.__list.__next); > > oldval = mutex->__data.__lock; > + /* This is set to FUTEX_WAITERS iff we might have shared the > + FUTEX_WAITERS flag with other threads, and therefore need to keep it > + set to avoid lost wake-ups. We have the same requirement in the > + simple mutex algorithm. */ > + unsigned int assume_other_futex_waiters = 0; OK. > do > { > again: > @@ -190,9 +195,9 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) > /* The previous owner died. Try locking the mutex. */ > int newval = id; > #ifdef NO_INCR > - newval |= FUTEX_WAITERS; > + newval |= FUTEX_WAITERS | assume_other_futex_waiters; > #else > - newval |= (oldval & FUTEX_WAITERS); > + newval |= (oldval & FUTEX_WAITERS) | assume_other_futex_waiters; OK. > #endif > > newval > @@ -253,7 +258,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) > } > } > > - oldval = LLL_ROBUST_MUTEX_LOCK (mutex, id); > + oldval = LLL_ROBUST_MUTEX_LOCK (mutex, > + id | assume_other_futex_waiters); > + /* See above. We set FUTEX_WAITERS and might have shared this flag > + with other threads; thus, we need to preserve it. */ > + assume_other_futex_waiters = FUTEX_WAITERS; OK. > > if (__builtin_expect (mutex->__data.__owner > == PTHREAD_MUTEX_NOTRECOVERABLE, 0)) > diff --git a/nptl/pthread_mutex_timedlock.c b/nptl/pthread_mutex_timedlock.c > index 07f0901..21e9eea 100644 > --- a/nptl/pthread_mutex_timedlock.c > +++ b/nptl/pthread_mutex_timedlock.c > @@ -142,13 +142,19 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex, > &mutex->__data.__list.__next); > > oldval = mutex->__data.__lock; > + /* This is set to FUTEX_WAITERS iff we might have shared the > + FUTEX_WAITERS flag with other threads, and therefore need to keep it > + set to avoid lost wake-ups. We have the same requirement in the > + simple mutex algorithm. */ > + unsigned int assume_other_futex_waiters = 0; OK. > do > { > again: > if ((oldval & FUTEX_OWNER_DIED) != 0) > { > /* The previous owner died. Try locking the mutex. */ > - int newval = id | (oldval & FUTEX_WAITERS); > + int newval = id | (oldval & FUTEX_WAITERS) > + | assume_other_futex_waiters; OK. > > newval > = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, > @@ -203,8 +209,12 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex, > } > } > > - result = lll_robust_timedlock (mutex->__data.__lock, abstime, id, > + result = lll_robust_timedlock (mutex->__data.__lock, abstime, > + id | assume_other_futex_waiters, > PTHREAD_ROBUST_MUTEX_PSHARED (mutex)); > + /* See above. We set FUTEX_WAITERS and might have shared this flag > + with other threads; thus, we need to preserve it. */ > + assume_other_futex_waiters = FUTEX_WAITERS; OK. > > if (__builtin_expect (mutex->__data.__owner > == PTHREAD_MUTEX_NOTRECOVERABLE, 0)) -- Cheers, Carlos.