Despite using the preferred atomic exchange in the initial check, the loop was unconditionally using CAS which is not desired on some architectures (those that didn't set `ATOMIC_EXCHANGE_USES_CAS`). No meaningful perf changes measured on broadwell but still seems like a reasonable change. Full check passes on x86-64. --- nptl/pthread_spin_lock.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c index 19d1759f9a..1bdd6e2048 100644 --- a/nptl/pthread_spin_lock.c +++ b/nptl/pthread_spin_lock.c @@ -20,6 +20,20 @@ #include "pthreadP.h" #include <shlib-compat.h> +#if ATOMIC_EXCHANGE_USES_CAS +/* Try to acquire the lock with a CAS instruction as this architecture + has no exchange instruction. The acquisition succeeds if the lock is not + acquired. */ +# define pthread_spin_lock_grab_lock(mem, val, c) \ + atomic_compare_exchange_weak_acquire (lock, &val, 1)) +#else +/* Try to acquire the lock with an exchange instruction as this architecture + has such an instruction and we assume it is faster than a CAS. + The acquisition succeeds if the lock is not in an acquired state. */ +# define pthread_spin_lock_grab_lock(mem, val, c) \ + (atomic_exchange_acquire (lock, 1) == 0) +#endif + int __pthread_spin_lock (pthread_spinlock_t *lock) { @@ -36,19 +50,8 @@ __pthread_spin_lock (pthread_spinlock_t *lock) We use acquire MO to synchronize-with the release MO store in pthread_spin_unlock, and thus ensure that prior critical sections happen-before this critical section. */ -#if ! ATOMIC_EXCHANGE_USES_CAS - /* Try to acquire the lock with an exchange instruction as this architecture - has such an instruction and we assume it is faster than a CAS. - The acquisition succeeds if the lock is not in an acquired state. */ - if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) + if (__glibc_likely (pthread_spin_lock_grab_lock (lock, &val, 1))) return 0; -#else - /* Try to acquire the lock with a CAS instruction as this architecture - has no exchange instruction. The acquisition succeeds if the lock is not - acquired. */ - if (__glibc_likely (atomic_compare_exchange_weak_acquire (lock, &val, 1))) - return 0; -#endif do { @@ -75,7 +78,7 @@ __pthread_spin_lock (pthread_spinlock_t *lock) /* We need acquire memory order here for the same reason as mentioned for the first try to lock the spinlock. */ } - while (!atomic_compare_exchange_weak_acquire (lock, &val, 1)); + while (!pthread_spin_lock_grab_lock (lock, &val, 1)); return 0; } -- 2.34.1
Hi Noah, Did you try building it both ways? I don't think this could ever compile: + if (__glibc_likely (pthread_spin_lock_grab_lock (lock, &val, 1))) and: +# define pthread_spin_lock_grab_lock(mem, val, c) \ + atomic_compare_exchange_weak_acquire (lock, &val, 1)) The define uses 'lock' and 'mem' inconsistently and the use of the macro expands into &&val... Apart from that there is the question whether we should keep the weird ATOMIC_EXCHANGE_USES_CAS setting - I have removed it in my atomic patch series since most targets appear confused as to what it means (so are likely to have the wrong setting). Also there is no evidence it is actually faster. So using exchange in both cases is easier (and less error prone!). Also you do realize that no matter how much you change this code, it won't make a difference on x86, right? Cheers, Wilco
On Thu, Sep 29, 2022 at 9:35 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Hi Noah, > > Did you try building it both ways? I don't think this could ever compile: > > + if (__glibc_likely (pthread_spin_lock_grab_lock (lock, &val, 1))) > > and: > > +# define pthread_spin_lock_grab_lock(mem, val, c) \ > + atomic_compare_exchange_weak_acquire (lock, &val, 1)) > > The define uses 'lock' and 'mem' inconsistently and the use of the macro > expands into &&val... > > Apart from that there is the question whether we should keep the weird > ATOMIC_EXCHANGE_USES_CAS setting - I have removed it in my atomic > patch series since most targets appear confused as to what it means (so > are likely to have the wrong setting). Also there is no evidence it is actually > faster. So using exchange in both cases is easier (and less error prone!). > > Also you do realize that no matter how much you change this code, it > won't make a difference on x86, right? Why's that? > > Cheers, > Wilco
Hi Noah,
Because various targets override it:
> find -name pthread_spin_lock.*
./nptl/pthread_spin_lock.c
./sysdeps/alpha/nptl/pthread_spin_lock.S
./sysdeps/powerpc/nptl/pthread_spin_lock.c
./sysdeps/i386/nptl/pthread_spin_lock.S
./sysdeps/sparc/sparc64/pthread_spin_lock.S
./sysdeps/sparc/sparc32/sparcv9/pthread_spin_lock.S
./sysdeps/sparc/sparc32/pthread_spin_lock.S
./sysdeps/x86_64/nptl/pthread_spin_lock.S
./sysdeps/ia64/nptl/pthread_spin_lock.c
./sysdeps/sh/nptl/pthread_spin_lock.c
Cheers,
Wilco
On Thu, Sep 29, 2022 at 11:40 AM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Thu, Sep 29, 2022 at 9:35 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> >
> > Hi Noah,
> >
> > Did you try building it both ways? I don't think this could ever compile:
> >
> > + if (__glibc_likely (pthread_spin_lock_grab_lock (lock, &val, 1)))
> >
> > and:
> >
> > +# define pthread_spin_lock_grab_lock(mem, val, c) \
> > + atomic_compare_exchange_weak_acquire (lock, &val, 1))
> >
> > The define uses 'lock' and 'mem' inconsistently and the use of the macro
> > expands into &&val...
> >
> > Apart from that there is the question whether we should keep the weird
> > ATOMIC_EXCHANGE_USES_CAS setting - I have removed it in my atomic
> > patch series since most targets appear confused as to what it means (so
> > are likely to have the wrong setting). Also there is no evidence it is actually
> > faster. So using exchange in both cases is easier (and less error prone!).
> >
> > Also you do realize that no matter how much you change this code, it
> > won't make a difference on x86, right?
>
> Why's that?
x86-64 uses sysdeps/x86_64/nptl/pthread_spin_lock.S
--
H.J.