From: Stefan Liebler <stli@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Subject: Re: [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock.
Date: Fri, 09 Dec 2016 15:17:00 -0000 [thread overview]
Message-ID: <b493ee72-62f6-e630-1dfa-5a2f678cf481@linux.vnet.ibm.com> (raw)
In-Reply-To: <1479992650.7146.1535.camel@localhost.localdomain>
On 11/24/2016 02:04 PM, Torvald Riegel wrote:
> On Wed, 2016-11-23 at 16:09 +0100, Stefan Liebler wrote:
>> This patch changes the behaviour of pthread_spin_lock on s390:
>> Instead of spinning on the lock with compare-and-swap instruction,
>> the atomic_compare_and_exchange_bool_acq macro is used.
>> The s390 atomic_compare_and_exchange_bool_acq macro called with constant
>> zero as oldval first compares the lock with normal instructions. If it is
>> free the compare-and-swap instruction is used to aquire the lock. While
>> the lock is held by one cpu another cpu will not exclusively lock the
>> memory of the lock in a loop. If the lock is unlocked by another cpu
>> it is observed by the normal instruction and the lock is acquired
>> with a compare-and-swap instruction.
>
> I don't want to have more arch-specific code than we really need.
> Please compare what you have against the generic spinlock code. If you
> find the generic spinlock code lacking, then please propose a change for
> it in a way that does not make things worse for other architectures.
>
> If there are arch-specific properties that significantly affect the
> approach the generic spinlock takes (eg, assumptions about CAS vs
> atomic-exchange runtime overheads), then please split them out.
>
> In the long term, this will also benefit s390. For example, the
> spinlock code you have has no backoff, and introduces spinning with
> loads in a pretty ugly way through the hack you have added to the CAS
> (first loading the lock's value and comparing it manually if the
> supplied argument is a constant).
> While the generic spinlock code we have is also not very great,
> improving it is what will make life easier for the whole glibc project.
> Also, if someone else improves the generic spinlock code, s390 would
> miss out on this if you have added your custom spinlock code.
>
I had a look into the assembler output of generic spinlock code, e.g:
int
pthread_spin_trylock (pthread_spinlock_t *lock)
{
return atomic_exchange_acq (lock, 1) ? EBUSY : 0;
}
On s390x assembler output, a new stack-frame is generated, the lock
value is loaded, stored to stack, loaded from stack and then passed to
cs-instruction.
After cs-instruction, the "old-value" is stored to stack, loaded from
stack and then compared to zero.
I also had a look into the aarch64 pthread_spin_trylock.os compiled with
build-many-script and a gcc 6.2.
aarch64 is using the __atomic-builtins for atomic_exchange_acq,
atomic_compare_and_exchange_val_acq, ... .
The atomic_exchange_acq results in the exclusive load/store. Then the
old-value is also stored to stack and is immediately loaded back in
order to compare it against zero.
The type pthread_spinlock_t is a volatile int!
If lock is casted to (int *) before passing it to the atomic macros,
the assembler output looks okay.
If I change the current generic spinlock code, do I have to rewrite it
to the c11-like macros?
I've tested the following function in advance:
int
foo (pthread_spinlock_t *lock)
{
return atomic_load_acquire (lock);
}
With the __atomic_load_n version of atomic_load_acquire, the assembler
output contains a load which is returned as expected.
The non-c11-macro results in the following:
0: 58 10 20 00 l %r1,0(%r2)
4: b3 c1 00 0f ldgr %f0,%r15
8: e3 f0 ff 58 ff 71 lay %r15,-168(%r15)
e: 50 10 f0 a4 st %r1,164(%r15)
12: 58 10 f0 a4 l %r1,164(%r15)
16: 50 10 f0 a0 st %r1,160(%r15)
1a: 58 20 f0 a0 l %r2,160(%r15)
1e: b3 cd 00 f0 lgdr %r15,%f0
22: b9 14 00 22 lgfr %r2,%r2
26: 07 fe br %r14
The extra stores/loads to/from stack result from the "__typeof (*(mem))
__atg101_val" usages in atomic_load_* macros in conjunction with volatile.
As this behaviour is not s390 specific, I've grep'ed to see which arches
use the generic spin-lock code and the c11-like-atomic-macros:
sysdeps/hppa/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/hppa/atomic-machine.h:40:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/microblaze/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/microblaze/atomic-machine.h:39:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/aarch64/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/aarch64/atomic-machine.h:40:#define USE_ATOMIC_COMPILER_BUILTINS 1
sysdeps/mips/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/mips/atomic-machine.h:95:#define USE_ATOMIC_COMPILER_BUILTINS 1
sysdeps/mips/atomic-machine.h:216:#define USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/nios2/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/nios2/atomic-machine.h:35:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/arm/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/arm/atomic-machine.h:37:#define USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/m68k/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/m68k/m680x0/m68020/atomic-machine.h:48:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/m68k/coldfire/atomic-machine.h:54:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h:40:#define
USE_ATOMIC_COMPILER_BUILTINS 0
What is your suggestion, how to proceed with the volatile int type in
conjunction with the atomic-macros?
Bye Stefan
next prev parent reply other threads:[~2016-12-09 15:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-23 15:09 [PATCH 1/2] S390: Optimize atomic macros Stefan Liebler
2016-11-23 15:09 ` [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock Stefan Liebler
2016-11-24 13:04 ` Torvald Riegel
2016-12-09 15:17 ` Stefan Liebler [this message]
2016-12-16 17:08 ` Stefan Liebler
2016-12-16 19:21 ` Torvald Riegel
2016-12-16 20:12 ` Florian Weimer
2016-11-24 12:51 ` [PATCH 1/2] S390: Optimize atomic macros Torvald Riegel
2016-12-09 15:17 ` Stefan Liebler
2016-12-16 17:08 ` Stefan Liebler
2016-12-16 19:15 ` Torvald Riegel
2017-02-27 11:36 ` Stefan Liebler
2017-02-28 7:33 ` Torvald Riegel
2017-03-06 14:42 ` Stefan Liebler
2016-12-16 19:18 ` Torvald Riegel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b493ee72-62f6-e630-1dfa-5a2f678cf481@linux.vnet.ibm.com \
--to=stli@linux.vnet.ibm.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).