From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18854 invoked by alias); 12 Aug 2014 02:29:41 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 18648 invoked by uid 48); 12 Aug 2014 02:29:24 -0000 From: "bugdal at aerifal dot cx" To: glibc-bugs@sourceware.org Subject: [Bug nptl/13690] pthread_mutex_unlock potentially cause invalid access Date: Tue, 12 Aug 2014 02:29:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.15 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bugdal at aerifal dot cx X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: 2.18 X-Bugzilla-Flags: review? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-08/txt/msg00032.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=13690 --- Comment #48 from Rich Felker --- > This would increase the unlock latency whenever there is any waiter (because > we let the kernel do it, and after it has found and acquired the futex lock). > I don't have numbers for this increase, but if there's a non-neglible increase > in latency, then I wouldn't want to see this in glibc. Torvald, I agree you have a legitimate concern (unlock latency), but while I don't have evidence to back this up (just high-level reasoning), I think the difference in time at which the atomic-store actually works in favor of performance with FUTEX_WAKE_OP. I'll try to explain: In the case where there is no waiter at the time of unlock, no wake occurs, neither by FUTEX_WAKE nor FUTEX_WAKE_OP. There's only an atomic operation (CAS, if we want to fix the bug this whole issue tracker thread is about). So for the sake of comparing performance, we need to consider the case where there is at least one waiter. Right now (with FUTEX_WAKE), there's a great deal of latency between the atomic operation that releases the lock and the FUTEX_WAKE being dispatched, due to kernel entry overhead, futex hash overhead, etc. During that window, a thread which is not a waiter can race for the lock and acquire it first, despite there being waiters. This acquisition inherently happens with very low latency, but I think it's actually likely to be bad for performance: If the thread which "stole" the lock has not released it by the time the thread woken by FUTEX_WAKE gets scheduled, the latter thread will uselessly contend for the lock again, imposing additional cache synchronization overhead and an additional syscall to wait on the futex again. It will also wrongly get moved to the end of the wait queue. If on the other hand, the thread which "stole" the lock immediately releases it, before the woken thread gets scheduled, my understanding is that it will see that there are waiters and issue an additional FUTEX_WAKE at unlock time. At the very least this is a wasted syscall. If there actually are two or more waiters, it's a lot more expensive, since an extra thread wakes up only to contend the lock and re-wait. As both of these situations seem undesirable to me, I think the optimal behavior should be to minimize the latency between the atomic-release operation that makes the lock available to other threads and the futex wake. And the only way to make this latency small is to perform the atomic release in kernel space. > I still think that the only thing we need to fix is to make sure that no > program can interpret a spurious wake-up (by a pending futex_wake) as a real > wake-up. As I understand it, all of the current code treats futex wakes much like POSIX condition variable waits: as an indication to re-check an external predicate rather than as the bearer of notification about state. If not, things would already be a lot more broken than they are now in regards to this issue. On the other hand, if you eliminate all sources of spurious wakes, I think it's possible to achieve better behavior; in particular I think it may be possible to prevent "stealing" of locks entirely and ensure that the next futex waiter always gets the lock on unlock. Whether this behavior is desirable for glibc or not, I'm not sure. I'm going to do research on it as a future possibility for musl. -- You are receiving this mail because: You are on the CC list for the bug.