From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <glibc-bugs-return-25949-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Received: (qmail 18854 invoked by alias); 12 Aug 2014 02:29:41 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Received: (qmail 18648 invoked by uid 48); 12 Aug 2014 02:29:24 -0000
From: "bugdal at aerifal dot cx" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug nptl/13690] pthread_mutex_unlock potentially cause invalid access
Date: Tue, 12 Aug 2014 02:29:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: nptl
X-Bugzilla-Version: 2.15
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: bugdal at aerifal dot cx
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: carlos at redhat dot com
X-Bugzilla-Target-Milestone: 2.18
X-Bugzilla-Flags: review?
X-Bugzilla-Changed-Fields:
Message-ID: <bug-13690-131-0MXRXrPOFG@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-13690-131@http.sourceware.org/bugzilla/>
References: <bug-13690-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-08/txt/msg00032.txt.bz2

https://sourceware.org/bugzilla/show_bug.cgi?id=13690
--- Comment #48 from Rich Felker <bugdal at aerifal dot cx> ---
> This would increase the unlock latency whenever there is any waiter (because
> we let the kernel do it, and after it has found and acquired the futex lock). 
> I don't have numbers for this increase, but if there's a non-neglible increase
> in latency, then I wouldn't want to see this in glibc.

Torvald, I agree you have a legitimate concern (unlock latency), but while I
don't have evidence to back this up (just high-level reasoning), I think the
difference in time at which the atomic-store actually works in favor of
performance with FUTEX_WAKE_OP. I'll try to explain:

In the case where there is no waiter at the time of unlock, no wake occurs,
neither by FUTEX_WAKE nor FUTEX_WAKE_OP. There's only an atomic operation (CAS,
if we want to fix the bug this whole issue tracker thread is about). So for the
sake of comparing performance, we need to consider the case where there is at
least one waiter.

Right now (with FUTEX_WAKE), there's a great deal of latency between the atomic
operation that releases the lock and the FUTEX_WAKE being dispatched, due to
kernel entry overhead, futex hash overhead, etc. During that window, a thread
which is not a waiter can race for the lock and acquire it first, despite there
being waiters. This acquisition inherently happens with very low latency, but I
think it's actually likely to be bad for performance:

If the thread which "stole" the lock has not released it by the time the thread
woken by FUTEX_WAKE gets scheduled, the latter thread will uselessly contend
for the lock again, imposing additional cache synchronization overhead and an
additional syscall to wait on the futex again. It will also wrongly get moved
to the end of the wait queue.

If on the other hand, the thread which "stole" the lock immediately releases
it, before the woken thread gets scheduled, my understanding is that it will
see that there are waiters and issue an additional FUTEX_WAKE at unlock time.
At the very least this is a wasted syscall. If there actually are two or more
waiters, it's a lot more expensive, since an extra thread wakes up only to
contend the lock and re-wait.

As both of these situations seem undesirable to me, I think the optimal
behavior should be to minimize the latency between the atomic-release operation
that makes the lock available to other threads and the futex wake. And the only
way to make this latency small is to perform the atomic release in kernel
space.

> I still think that the only thing we need to fix is to make sure that no
> program can interpret a spurious wake-up (by a pending futex_wake) as a real
> wake-up.

As I understand it, all of the current code treats futex wakes much like POSIX
condition variable waits: as an indication to re-check an external predicate
rather than as the bearer of notification about state. If not, things would
already be a lot more broken than they are now in regards to this issue.

On the other hand, if you eliminate all sources of spurious wakes, I think it's
possible to achieve better behavior; in particular I think it may be possible
to prevent "stealing" of locks entirely and ensure that the next futex waiter
always gets the lock on unlock. Whether this behavior is desirable for glibc or
not, I'm not sure. I'm going to do research on it as a future possibility for
musl.

-- 
You are receiving this mail because:
You are on the CC list for the bug.