* PI mutex support for pthread_cond_* now in nptl @ 2013-02-18 10:56 Siddhesh Poyarekar 2013-02-18 23:50 ` Joseph S. Myers ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-18 10:56 UTC (permalink / raw) To: libc-ports; +Cc: libc-alpha Hi arch-maintainers! The x86 assembly code for pthread_cond_* functions has, since some time now, been capable of utilizing PI mutexes (using PTHREAD_PRIO_INHERIT when initializing a mutex) optimally by using the FUTEX_WAIT_REQUEUE_PI and its companion FUTEX_CMP_REQUEUE_PI to requeue waiters woken on pthread_cond_broadcast to the PI mutex while waking just one child, hence avoiding the thundering herd effect. This code is now available on the default C implementation in master, which is used by most non-x86 architectures in glibc. To allow your architecture to use this, you need to define the following: - lll_futex_wait_requeue_pi macro to call the FUTEX_WAIT_REQUEUE_PI operation in a futex syscall. - lll_futex_timed_wait_requeue_pi, which is the timed equivalent of the above. - lll_futex_cmp_requeue_pi to call FUTEX_CMP_REQUEUE_PI. I have committed implementations for s390 and powerpc, which can be used as examples for implementation on other architectures. Siddhesh ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-18 10:56 PI mutex support for pthread_cond_* now in nptl Siddhesh Poyarekar @ 2013-02-18 23:50 ` Joseph S. Myers 2013-02-19 0:50 ` Joseph S. Myers 2013-02-19 16:42 ` Richard Henderson [not found] ` <5123AB55.2070100__45742.2411222526$1361292204$gmane$org@twiddle.net> 2 siblings, 1 reply; 18+ messages in thread From: Joseph S. Myers @ 2013-02-18 23:50 UTC (permalink / raw) To: libc-ports I've applied this patch to implement this for ARM. diff --git a/ports/ChangeLog.arm b/ports/ChangeLog.arm index 3a5880b..5b5220f 100644 --- a/ports/ChangeLog.arm +++ b/ports/ChangeLog.arm @@ -1,3 +1,13 @@ +2013-02-18 Joseph Myers <joseph@codesourcery.com> + + [BZ #14920] + * sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h + (FUTEX_WAIT_REQUEUE_PI): Define. + (FUTEX_CMP_REQUEUE_PI): Likewise. + (lll_futex_wait_requeue_pi): Likewise. + (lll_futex_timed_wait_requeue_pi): Likewise. + (lll_futex_cmp_requeue_pi): Likewise. + 2013-02-18 Siddhesh Poyarekar <siddhesh@redhat.com> * sysdeps/unix/sysv/linux/arm/nptl/libc.abilist: Add diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h index 0278812..fe21cda 100644 --- a/ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h +++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h @@ -36,6 +36,8 @@ #define FUTEX_TRYLOCK_PI 8 #define FUTEX_WAIT_BITSET 9 #define FUTEX_WAKE_BITSET 10 +#define FUTEX_WAIT_REQUEUE_PI 11 +#define FUTEX_CMP_REQUEUE_PI 12 #define FUTEX_PRIVATE_FLAG 128 #define FUTEX_CLOCK_REALTIME 256 @@ -141,6 +143,32 @@ INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ }) +/* Priority Inheritance support. */ +#define lll_futex_wait_requeue_pi(futexp, val, mutex, private) \ + lll_futex_timed_wait_requeue_pi (futexp, val, NULL, 0, mutex, private) + +#define lll_futex_timed_wait_requeue_pi(futexp, val, timespec, clockbit, \ + mutex, private) \ + ({ \ + INTERNAL_SYSCALL_DECL (__err); \ + int __op = FUTEX_WAIT_REQUEUE_PI | clockbit; \ + \ + INTERNAL_SYSCALL (futex, __err, 5, (futexp), \ + __lll_private_flag (__op, private), \ + (val), (timespec), mutex); \ + }) + +#define lll_futex_cmp_requeue_pi(futexp, nr_wake, nr_move, mutex, val, priv) \ + ({ \ + INTERNAL_SYSCALL_DECL (__err); \ + long int __ret; \ + \ + __ret = INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ + __lll_private_flag (FUTEX_CMP_REQUEUE_PI, priv),\ + (nr_wake), (nr_move), (mutex), (val)); \ + INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ + }) + #define lll_trylock(lock) \ atomic_compare_and_exchange_val_acq(&(lock), 1, 0) -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-18 23:50 ` Joseph S. Myers @ 2013-02-19 0:50 ` Joseph S. Myers 0 siblings, 0 replies; 18+ messages in thread From: Joseph S. Myers @ 2013-02-19 0:50 UTC (permalink / raw) To: libc-ports I've applied this patch with the MIPS implementation. diff --git a/ports/ChangeLog.mips b/ports/ChangeLog.mips index de967c6..c07bf97 100644 --- a/ports/ChangeLog.mips +++ b/ports/ChangeLog.mips @@ -1,3 +1,13 @@ +2013-02-19 Joseph Myers <joseph@codesourcery.com> + + [BZ #14920] + * sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h + (FUTEX_WAIT_REQUEUE_PI): Define. + (FUTEX_CMP_REQUEUE_PI): Likewise. + (lll_futex_wait_requeue_pi): Likewise. + (lll_futex_timed_wait_requeue_pi): Likewise. + (lll_futex_cmp_requeue_pi): Likewise. + 2013-02-18 Siddhesh Poyarekar <siddhesh@redhat.com> * sysdeps/unix/sysv/linux/mips/mips32/nptl/libc.abilist: Add diff --git a/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h b/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h index fe7c747..4f27b8b 100644 --- a/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h +++ b/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h @@ -36,6 +36,8 @@ #define FUTEX_TRYLOCK_PI 8 #define FUTEX_WAIT_BITSET 9 #define FUTEX_WAKE_BITSET 10 +#define FUTEX_WAIT_REQUEUE_PI 11 +#define FUTEX_CMP_REQUEUE_PI 12 #define FUTEX_PRIVATE_FLAG 128 #define FUTEX_CLOCK_REALTIME 256 @@ -141,6 +143,34 @@ INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ }) +/* Priority Inheritance support. */ +#define lll_futex_wait_requeue_pi(futexp, val, mutex, private) \ + lll_futex_timed_wait_requeue_pi (futexp, val, NULL, 0, mutex, private) + +#define lll_futex_timed_wait_requeue_pi(futexp, val, timespec, clockbit, \ + mutex, private) \ + ({ \ + INTERNAL_SYSCALL_DECL (__err); \ + long int __ret; \ + int __op = FUTEX_WAIT_REQUEUE_PI | clockbit; \ + \ + __ret = INTERNAL_SYSCALL (futex, __err, 5, (futexp), \ + __lll_private_flag (__op, private), \ + (val), (timespec), mutex); \ + INTERNAL_SYSCALL_ERROR_P (__ret, __err) ? -__ret : __ret; \ + }) + +#define lll_futex_cmp_requeue_pi(futexp, nr_wake, nr_move, mutex, val, priv) \ + ({ \ + INTERNAL_SYSCALL_DECL (__err); \ + long int __ret; \ + \ + __ret = INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ + __lll_private_flag (FUTEX_CMP_REQUEUE_PI, priv),\ + (nr_wake), (nr_move), (mutex), (val)); \ + INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ + }) + static inline int __attribute__((always_inline)) __lll_trylock(int *futex) { -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-18 10:56 PI mutex support for pthread_cond_* now in nptl Siddhesh Poyarekar 2013-02-18 23:50 ` Joseph S. Myers @ 2013-02-19 16:42 ` Richard Henderson 2013-02-19 16:49 ` Siddhesh Poyarekar 2013-02-19 17:18 ` Joseph S. Myers [not found] ` <5123AB55.2070100__45742.2411222526$1361292204$gmane$org@twiddle.net> 2 siblings, 2 replies; 18+ messages in thread From: Richard Henderson @ 2013-02-19 16:42 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: libc-ports, libc-alpha On 2013-02-18 02:56, Siddhesh Poyarekar wrote: > Hi arch-maintainers! > > The x86 assembly code for pthread_cond_* functions has, since some > time now, been capable of utilizing PI mutexes (using > PTHREAD_PRIO_INHERIT when initializing a mutex) optimally by using the > FUTEX_WAIT_REQUEUE_PI and its companion FUTEX_CMP_REQUEUE_PI to > requeue waiters woken on pthread_cond_broadcast to the PI mutex while > waking just one child, hence avoiding the thundering herd effect. > This code is now available on the default C implementation in master, > which is used by most non-x86 architectures in glibc. > > To allow your architecture to use this, you need to define the > following: > > - lll_futex_wait_requeue_pi macro to call the FUTEX_WAIT_REQUEUE_PI > operation in a futex syscall. > > - lll_futex_timed_wait_requeue_pi, which is the timed equivalent of > the above. > > - lll_futex_cmp_requeue_pi to call FUTEX_CMP_REQUEUE_PI. > > I have committed implementations for s390 and powerpc, which can be > used as examples for implementation on other architectures. Any chance we can move these macros into a generic linux header? Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be the same for all targets. r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 16:42 ` Richard Henderson @ 2013-02-19 16:49 ` Siddhesh Poyarekar 2013-02-19 17:18 ` Joseph S. Myers 1 sibling, 0 replies; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-19 16:49 UTC (permalink / raw) To: Richard Henderson; +Cc: libc-ports, libc-alpha On Tue, Feb 19, 2013 at 08:41:57AM -0800, Richard Henderson wrote: > Any chance we can move these macros into a generic linux header? > Given that we're using INTERNAL_SYSCALL macros, the definitions ought > to be the same for all targets. It is on my TODO list, but not near the top as of now. Maybe someone else might want to get to it before I do. Siddhesh ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 16:42 ` Richard Henderson 2013-02-19 16:49 ` Siddhesh Poyarekar @ 2013-02-19 17:18 ` Joseph S. Myers 2013-02-19 17:21 ` Siddhesh Poyarekar ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Joseph S. Myers @ 2013-02-19 17:18 UTC (permalink / raw) To: Richard Henderson; +Cc: Siddhesh Poyarekar, libc-ports, libc-alpha On Tue, 19 Feb 2013, Richard Henderson wrote: > Any chance we can move these macros into a generic linux header? > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > the same for all targets. Generally most of lowlevellock.h should probably be shared between architectures. (If some architectures don't implement a particular feature as of a particular kernel version, that's a matter for kernel-features.h and __ASSUME_* conditionals.) There are however two different groups of those headers, those where e.g. lll_futex_timed_wait ends with INTERNAL_SYSCALL_ERROR_P (__ret, __err) ? -__ret : __ret; (e.g. powerpc) and those (e.g. s390) where it just ends with a call to INTERNAL_SYSCALL and relies on the return value of INTERNAL_SYSCALL being the correct return from the macro. I'm not sure whether those differences actually serve a useful purpose or whether the return value of INTERNAL_SYSCALL differs in a relevant way between architectures. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 17:18 ` Joseph S. Myers @ 2013-02-19 17:21 ` Siddhesh Poyarekar 2013-02-19 20:06 ` Torvald Riegel [not found] ` <1361304381.581.80.camel__6928.53579898856$1361304432$gmane$org@triegel.csb> 2 siblings, 0 replies; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-19 17:21 UTC (permalink / raw) To: Joseph S. Myers; +Cc: Richard Henderson, libc-ports, libc-alpha On Tue, Feb 19, 2013 at 05:18:20PM +0000, Joseph S. Myers wrote: > There are however two different groups of those headers, those where e.g. > lll_futex_timed_wait ends with > > INTERNAL_SYSCALL_ERROR_P (__ret, __err) ? -__ret : __ret; > > (e.g. powerpc) and those (e.g. s390) where it just ends with a call to > INTERNAL_SYSCALL and relies on the return value of INTERNAL_SYSCALL being > the correct return from the macro. I'm not sure whether those differences > actually serve a useful purpose or whether the return value of > INTERNAL_SYSCALL differs in a relevant way between architectures. From the very superficial analysis I had done last year, it should be possible to consolidate all that. Siddhesh ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 17:18 ` Joseph S. Myers 2013-02-19 17:21 ` Siddhesh Poyarekar @ 2013-02-19 20:06 ` Torvald Riegel 2013-02-20 18:13 ` Steven Munroe [not found] ` <1361304381.581.80.camel__6928.53579898856$1361304432$gmane$org@triegel.csb> 2 siblings, 1 reply; 18+ messages in thread From: Torvald Riegel @ 2013-02-19 20:06 UTC (permalink / raw) To: Joseph S. Myers Cc: Richard Henderson, Siddhesh Poyarekar, libc-ports, libc-alpha On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > Any chance we can move these macros into a generic linux header? > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > the same for all targets. > > Generally most of lowlevellock.h should probably be shared between > architectures. (If some architectures don't implement a particular > feature as of a particular kernel version, that's a matter for > kernel-features.h and __ASSUME_* conditionals.) On a related note: What are the reasons to have arch-specific assembler versions of many of the synchronization operations? I would be surprised if they'd provide a significant performance advantage; has anyone recent measurements for this? It seems to me that it would be useful to consolidate the different versions that exist for the synchronization operations into shared C code as long as this doesn't make a significant performance difference. They are all based on atomic operations and futex operations, both of which we have in C code (especially if we have compilers that support the C11 memory model). Or are there other reasons for keeping different versions that I'm not aware of? Torvald ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 20:06 ` Torvald Riegel @ 2013-02-20 18:13 ` Steven Munroe 2013-02-20 20:25 ` Torvald Riegel 0 siblings, 1 reply; 18+ messages in thread From: Steven Munroe @ 2013-02-20 18:13 UTC (permalink / raw) To: Torvald Riegel Cc: Joseph S. Myers, Richard Henderson, Siddhesh Poyarekar, libc-ports, libc-alpha On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote: > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > > > Any chance we can move these macros into a generic linux header? > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > > the same for all targets. > > > > Generally most of lowlevellock.h should probably be shared between > > architectures. (If some architectures don't implement a particular > > feature as of a particular kernel version, that's a matter for > > kernel-features.h and __ASSUME_* conditionals.) > > On a related note: What are the reasons to have arch-specific assembler > versions of many of the synchronization operations? I would be > surprised if they'd provide a significant performance advantage; has > anyone recent measurements for this? > The introduction of GCC compiler builtins like __sync is fairly recent and the new __atomic builtins start with GCC-4.7. So until recently we had no choice. For platforms (like PowerPC) that implement acquire/release the GCC __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not pervasive enough to make that the default. > It seems to me that it would be useful to consolidate the different > versions that exist for the synchronization operations into shared C > code as long as this doesn't make a significant performance difference. > They are all based on atomic operations and futex operations, both of > which we have in C code (especially if we have compilers that support > the C11 memory model). Or are there other reasons for keeping different > versions that I'm not aware of? > I disagree. The performance of lowlevellocks and associated platform specific optimizations are too import to move forward with the consolidation you suggest. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-20 18:13 ` Steven Munroe @ 2013-02-20 20:25 ` Torvald Riegel 2013-02-20 22:04 ` Roland McGrath ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Torvald Riegel @ 2013-02-20 20:25 UTC (permalink / raw) To: munroesj Cc: Joseph S. Myers, Richard Henderson, Siddhesh Poyarekar, libc-ports, libc-alpha On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote: > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote: > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > > > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > > > > > Any chance we can move these macros into a generic linux header? > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > > > the same for all targets. > > > > > > Generally most of lowlevellock.h should probably be shared between > > > architectures. (If some architectures don't implement a particular > > > feature as of a particular kernel version, that's a matter for > > > kernel-features.h and __ASSUME_* conditionals.) > > > > On a related note: What are the reasons to have arch-specific assembler > > versions of many of the synchronization operations? I would be > > surprised if they'd provide a significant performance advantage; has > > anyone recent measurements for this? > > > The introduction of GCC compiler builtins like __sync is fairly recent > and the new __atomic builtins start with GCC-4.7. So until recently we > had no choice. Using assembler for the atomic operations is possible (e.g., as in Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others). It doesn't allow for the same level of compiler optimization across barriers, but it's unclear whether that has much benefit, and GCC doesn't do it yet anyway. There are some cases in which compilers that don't support the C11/C++11 memory model can generate code that wouldn't be correct in such a model, and which can theoretically interfere with other concurrent code (e.g., introduce data races due to accesses being too wide). However, because we don't have custom assembler for everything, we should be already exposed to that. > For platforms (like PowerPC) that implement acquire/release the GCC > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not > pervasive enough to make that the default. I agree regarding the __sync builtins, but using assembler in place of the __atomic builtins should work, or not? > > It seems to me that it would be useful to consolidate the different > > versions that exist for the synchronization operations into shared C > > code as long as this doesn't make a significant performance difference. > > They are all based on atomic operations and futex operations, both of > > which we have in C code (especially if we have compilers that support > > the C11 memory model). Or are there other reasons for keeping different > > versions that I'm not aware of? > > > I disagree. The performance of lowlevellocks and associated platform > specific optimizations are too import to move forward with the > consolidation you suggest. Which specific optimizations do you refer to? I didn't see any for powerpc, for example (i.e., the lock fast path is C up to the point of the atomic operation). The ones that I saw are for x86, and I'm wondering whether they provide much benefit. Especially because this can mostly just matter for the execution path taken when a free lock is acquired; once you get any cache miss, you're to some extent on the slow path anyway. Also, for the Linux platforms I looked at, the mutex algorithms are the same. Do you have any recent measurements (or could point to them) that show the benefit of the optimizations you refer to? For example, we've spent quite some time debugging a PI cond var failure in the past, and this wasn't made any easier by having several (different) versions of the cond var implementation. Torvald ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-20 20:25 ` Torvald Riegel @ 2013-02-20 22:04 ` Roland McGrath 2013-02-20 22:41 ` Steven Munroe 2013-02-21 4:32 ` Siddhesh Poyarekar 2 siblings, 0 replies; 18+ messages in thread From: Roland McGrath @ 2013-02-20 22:04 UTC (permalink / raw) To: Torvald Riegel; +Cc: libc-ports, libc-alpha At least the x86 assembly versions were written because the code produced by GCC at the time was drastically worse. But that was long ago. We now mandate compiling with GCC 4.3 or newer and recommend 4.6 or newer, so it merits examining the code produced by those compilers and reconsidering the judgments of the past. But we will be conservative and not drop any hand-written assembly unless recent compilers at -O2 are demonstrated to produce code at least as good. (Perhaps the reasonable standard is something like at least as good from 4.6 -O3 and not a whole lot worse from 4.3 -O2.) As time goes on, we can of course examine then-common compilers and reconsider these judgments. As of now, even 4.7 is not common yet. Thanks, Roland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-20 20:25 ` Torvald Riegel 2013-02-20 22:04 ` Roland McGrath @ 2013-02-20 22:41 ` Steven Munroe 2013-02-21 4:32 ` Siddhesh Poyarekar 2 siblings, 0 replies; 18+ messages in thread From: Steven Munroe @ 2013-02-20 22:41 UTC (permalink / raw) To: Torvald Riegel Cc: munroesj, Joseph S. Myers, Richard Henderson, Siddhesh Poyarekar, libc-ports, libc-alpha On Wed, 2013-02-20 at 21:25 +0100, Torvald Riegel wrote: > On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote: > > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote: > > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > > > > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > > > > > > > Any chance we can move these macros into a generic linux header? > > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > > > > the same for all targets. > > > > > > > > Generally most of lowlevellock.h should probably be shared between > > > > architectures. (If some architectures don't implement a particular > > > > feature as of a particular kernel version, that's a matter for > > > > kernel-features.h and __ASSUME_* conditionals.) > > > > > > On a related note: What are the reasons to have arch-specific assembler > > > versions of many of the synchronization operations? I would be > > > surprised if they'd provide a significant performance advantage; has > > > anyone recent measurements for this? > > > > > The introduction of GCC compiler builtins like __sync is fairly recent > > and the new __atomic builtins start with GCC-4.7. So until recently we > > had no choice. > > Using assembler for the atomic operations is possible (e.g., as in > Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others). > It doesn't allow for the same level of compiler optimization across > barriers, but it's unclear whether that has much benefit, and GCC > doesn't do it yet anyway. > > There are some cases in which compilers that don't support the C11/C++11 > memory model can generate code that wouldn't be correct in such a model, > and which can theoretically interfere with other concurrent code (e.g., > introduce data races due to accesses being too wide). However, because > we don't have custom assembler for everything, we should be already > exposed to that. > > > For platforms (like PowerPC) that implement acquire/release the GCC > > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not > > pervasive enough to make that the default. > > I agree regarding the __sync builtins, but using assembler in place of > the __atomic builtins should work, or not? > > > > It seems to me that it would be useful to consolidate the different > > > versions that exist for the synchronization operations into shared C > > > code as long as this doesn't make a significant performance difference. > > > They are all based on atomic operations and futex operations, both of > > > which we have in C code (especially if we have compilers that support > > > the C11 memory model). Or are there other reasons for keeping different > > > versions that I'm not aware of? > > > > > I disagree. The performance of lowlevellocks and associated platform > > specific optimizations are too import to move forward with the > > consolidation you suggest. > > Which specific optimizations do you refer to? I didn't see any for > powerpc, for example (i.e., the lock fast path is C up to the point of > the atomic operation). The ones that I saw are for x86, and I'm > wondering whether they provide much benefit. Especially because this > can mostly just matter for the execution path taken when a free lock is > acquired; once you get any cache miss, you're to some extent on the slow > path anyway. Also, for the Linux platforms I looked at, the mutex > algorithms are the same. > Like the lwarx MUTEX_HINT (EH field) hint. > Do you have any recent measurements (or could point to them) that show > the benefit of the optimizations you refer to? > No. I don't current have access to a machine big enough show this effect and I cant tell you about the specific customer. So you will have to trust me on this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-20 20:25 ` Torvald Riegel 2013-02-20 22:04 ` Roland McGrath 2013-02-20 22:41 ` Steven Munroe @ 2013-02-21 4:32 ` Siddhesh Poyarekar 2 siblings, 0 replies; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-21 4:32 UTC (permalink / raw) To: Torvald Riegel Cc: munroesj, Joseph S. Myers, Richard Henderson, libc-ports, libc-alpha On Wed, Feb 20, 2013 at 09:25:26PM +0100, Torvald Riegel wrote: > For example, we've spent quite some time debugging a PI cond var failure > in the past, and this wasn't made any easier by having several > (different) versions of the cond var implementation. I'd like to think that at least the variations are gone now with my patches, so if you're looking to only do an analysis of the algorithm, the C code should give sufficient insight into what the architecture-specific code does. Siddhesh ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <1361304381.581.80.camel__6928.53579898856$1361304432$gmane$org@triegel.csb>]
* Re: PI mutex support for pthread_cond_* now in nptl [not found] ` <1361304381.581.80.camel__6928.53579898856$1361304432$gmane$org@triegel.csb> @ 2013-02-22 4:11 ` Andi Kleen 0 siblings, 0 replies; 18+ messages in thread From: Andi Kleen @ 2013-02-22 4:11 UTC (permalink / raw) To: Torvald Riegel Cc: Joseph S. Myers, Richard Henderson, Siddhesh Poyarekar, libc-ports, libc-alpha Torvald Riegel <triegel@redhat.com> writes: > On a related note: What are the reasons to have arch-specific assembler > versions of many of the synchronization operations? I would be > surprised if they'd provide a significant performance advantage; has > anyone recent measurements for this? Yes changing the x86 assembler code is a big pain and I bet it prevents a lot of optimizations because it's hard to change. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5123AB55.2070100__45742.2411222526$1361292204$gmane$org@twiddle.net>]
* Re: PI mutex support for pthread_cond_* now in nptl [not found] ` <5123AB55.2070100__45742.2411222526$1361292204$gmane$org@twiddle.net> @ 2013-02-19 17:01 ` Andreas Schwab 2013-02-19 17:18 ` Siddhesh Poyarekar 0 siblings, 1 reply; 18+ messages in thread From: Andreas Schwab @ 2013-02-19 17:01 UTC (permalink / raw) To: Richard Henderson; +Cc: Siddhesh Poyarekar, libc-ports, libc-alpha Richard Henderson <rth@twiddle.net> writes: > Any chance we can move these macros into a generic linux header? What would that do to architectures that don't implement PI futexes? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 17:01 ` Andreas Schwab @ 2013-02-19 17:18 ` Siddhesh Poyarekar 2013-02-19 17:39 ` Andreas Schwab 0 siblings, 1 reply; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-19 17:18 UTC (permalink / raw) To: Andreas Schwab; +Cc: Richard Henderson, libc-ports, libc-alpha On Tue, Feb 19, 2013 at 06:01:13PM +0100, Andreas Schwab wrote: > What would that do to architectures that don't implement PI futexes? > They get them implemented for free. As far as the linux kernel is concerned, PI futexes are available on all architectures. Siddhesh ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 17:18 ` Siddhesh Poyarekar @ 2013-02-19 17:39 ` Andreas Schwab 2013-02-19 17:52 ` Siddhesh Poyarekar 0 siblings, 1 reply; 18+ messages in thread From: Andreas Schwab @ 2013-02-19 17:39 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: Richard Henderson, libc-ports, libc-alpha Siddhesh Poyarekar <siddhesh@redhat.com> writes: > As far as the linux kernel is concerned, PI futexes are available on > all architectures. No. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PI mutex support for pthread_cond_* now in nptl 2013-02-19 17:39 ` Andreas Schwab @ 2013-02-19 17:52 ` Siddhesh Poyarekar 0 siblings, 0 replies; 18+ messages in thread From: Siddhesh Poyarekar @ 2013-02-19 17:52 UTC (permalink / raw) To: Andreas Schwab Cc: Siddhesh Poyarekar, Richard Henderson, libc-ports, libc-alpha On 19 February 2013 23:09, Andreas Schwab <schwab@suse.de> wrote: > Siddhesh Poyarekar <siddhesh@redhat.com> writes: > >> As far as the linux kernel is concerned, PI futexes are available on >> all architectures. > > No. > Oh ok, it's not. It's a runtime check and not a compile-time check that I looked for. -- http://siddhesh.in ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-02-22 4:11 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-02-18 10:56 PI mutex support for pthread_cond_* now in nptl Siddhesh Poyarekar 2013-02-18 23:50 ` Joseph S. Myers 2013-02-19 0:50 ` Joseph S. Myers 2013-02-19 16:42 ` Richard Henderson 2013-02-19 16:49 ` Siddhesh Poyarekar 2013-02-19 17:18 ` Joseph S. Myers 2013-02-19 17:21 ` Siddhesh Poyarekar 2013-02-19 20:06 ` Torvald Riegel 2013-02-20 18:13 ` Steven Munroe 2013-02-20 20:25 ` Torvald Riegel 2013-02-20 22:04 ` Roland McGrath 2013-02-20 22:41 ` Steven Munroe 2013-02-21 4:32 ` Siddhesh Poyarekar [not found] ` <1361304381.581.80.camel__6928.53579898856$1361304432$gmane$org@triegel.csb> 2013-02-22 4:11 ` Andi Kleen [not found] ` <5123AB55.2070100__45742.2411222526$1361292204$gmane$org@twiddle.net> 2013-02-19 17:01 ` Andreas Schwab 2013-02-19 17:18 ` Siddhesh Poyarekar 2013-02-19 17:39 ` Andreas Schwab 2013-02-19 17:52 ` Siddhesh Poyarekar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).