This changes the generic C implementation of semaphores to use different algorithms. The core issue of the old implementation was that it kept two fields for both the value of the semaphore (ie, how many tokens it has that can be acquired by sem_wait calls) and the number of threads waiting to acquire a token using a futex_wait. We want to track the latter to be able to avoid calling futex_wake if there is actually no waiter. We can't put both of these counters into one 32b field because of the number of threads and or tokens we need to support. When users try to use sem_wait in the way discussed in http://austingroupbugs.net/view.php?id=811 then one thread calls sem_post, and another calls sem_wait *and* expects that once it has the token (ie, sem_wait has returned), it can call sem_destroy. Therefore, the implementation needs to ensure that sem_post makes no access to the semaphore data structure after it has provided a token. If we have 64b atomic operations, we can atomically increment the value of the semaphore and load the field counting the number of waiters. If we just have 32b operations, we need to do something more involved. The key here is to track a conservative value for the number of waiters in one bit of the value, and use that. Because it's a conservative value, it may be wrong and we may call futex_wake unnecessarily under some conditions. When we reset this bit (ie, speculate that there are no other waiters), we can be wrong but can make this misspeculation harmless if we wake as many waiters as there are tokens available. We do all these latter steps in sem_wait to still satisfy the requirement for sem_post regarding not accessing the semaphore after providing a token. Detailed comments can be found in the patch itself; see nptl/sem_wait.c. The semaphore data structure is adapted accordingly, as are other related functions: sem_open, sem_init, sem_getvalue, sem_trywait, sem_timedwait. because sem_trywait and sem_timedwait are very small, it seemed best to simply put them all into sem_wait.c I removed nptl/DESIGN-sem.txt because it was outdated and the comments in the code provide much more detail. The changes to tst-sem11 and tst-sem13 are simple adaptions; they peek inside the semaphore data structure. The custom assembler implementations on x86, x86_64, and sh have been removed. alpha and powerpc had custom sem_post implementations, but all that those were lacking compared to the generic version was a write barrier in sem_post. The new algorithm is correctly synchronized; for the old version of sem_post, I have added a call to atomic_write_barrier directly. This allows us to remove the custom alpha and powerpc variants. sparc still has its own semaphore implementation. I'd like to get feedback from sparc maintainers regarding what should happen with it. Note that this patch set is on top of the patch providing a new internal futex API: https://sourceware.org/ml/libc-alpha/2014-12/msg00154.html Tested on x86 and x86_64. I have not tested the old versions (__old_sem*) yet. [BZ #12674] * nptl/sem_wait.c(__new_sem_wait_fast): New function. Implement new semaphore algorithms. (__new_sem_wait_slow): New function. (__sem_wait_32_finish): New function. (__sem_wait_cleanup): Adapt. (do_futex_wait_inner): Adapt. (do_futex_wait): Adapt. (__new_sem_wait): Adapt. (sem_timedwait): New function. (__new_sem_trywait): New function. (__old_sem_trywait): Moved here from nptl/sem_trywait.c. * nptl/sem_post.c (__new_sem_post): Adapt. (__old_sem_post): Add release fence. * nptl/sem_open.c (sem_open): Adapt. * nptl/sem_init.c (__new_sem_init): Adapt. * nptl/sem_getvalue.c (__new_sem_getvalue): Adapt. (__old_sem_getvalue): Add using previous code. * sysdeps/nptl/internaltypes.h: Adapt. * nptl/tst-sem13.c (do_test): Adapt. * nptl/tst-sem11.c (main): Adapt. * nptl/sem_timedwait.c: Remove. * nptl/sem_trywait.c: Remove. * nptl/DESIGN-sem.txt: Remove. * nptl/Makefile (libpthread-routines): Remove sem_trywait, sem_wait. (gen-as-const-headers): Remove structsem.sym. * nptl/structsem.sym: Remove. * sysdeps/unix/sysv/linux/alpha/sem_post.c: Remove. * sysdeps/unix/sysv/linux/i386/i486/sem_post.S: Remove. * sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S: Remove. * sysdeps/unix/sysv/linux/i386/i486/sem_trywait.S: Remove. * sysdeps/unix/sysv/linux/i386/i486/sem_wait.S: Remove. * sysdeps/unix/sysv/linux/i386/i586/sem_post.S: Remove. * sysdeps/unix/sysv/linux/i386/i586/sem_timedwait.S: Remove. * sysdeps/unix/sysv/linux/i386/i586/sem_trywait.S: Remove. * sysdeps/unix/sysv/linux/i386/i586/sem_wait.S: Remove. * sysdeps/unix/sysv/linux/i386/i686/sem_post.S: Remove. * sysdeps/unix/sysv/linux/i386/i686/sem_timedwait.S: Remove. * sysdeps/unix/sysv/linux/i386/i686/sem_trywait.S: Remove. * sysdeps/unix/sysv/linux/i386/i686/sem_wait.S: Remove. * sysdeps/unix/sysv/linux/powerpc/sem_post.c: Remove. * sysdeps/unix/sysv/linux/sh/sem_post.S: Remove. * sysdeps/unix/sysv/linux/sh/sem_timedwait.S: Remove. * sysdeps/unix/sysv/linux/sh/sem_trywait.S: Remove. * sysdeps/unix/sysv/linux/sh/sem_wait.S: Remove. * sysdeps/unix/sysv/linux/x86_64/sem_post.S: Remove. * sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S: Remove. * sysdeps/unix/sysv/linux/x86_64/sem_trywait.S: Remove. * sysdeps/unix/sysv/linux/x86_64/sem_wait.S: Remove.