Re: [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Stefan Liebler <stli@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Subject: Re: [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock.
Date: Fri, 09 Dec 2016 15:17:00 -0000	[thread overview]
Message-ID: <b493ee72-62f6-e630-1dfa-5a2f678cf481@linux.vnet.ibm.com> (raw)
In-Reply-To: <1479992650.7146.1535.camel@localhost.localdomain>

On 11/24/2016 02:04 PM, Torvald Riegel wrote:
> On Wed, 2016-11-23 at 16:09 +0100, Stefan Liebler wrote:
>> This patch changes the behaviour of pthread_spin_lock on s390:
>> Instead of spinning on the lock with compare-and-swap instruction,
>> the atomic_compare_and_exchange_bool_acq macro is used.
>> The s390 atomic_compare_and_exchange_bool_acq macro called with constant
>> zero as oldval first compares the lock with normal instructions.  If it is
>> free the compare-and-swap instruction is used to aquire the lock.  While
>> the lock is held by one cpu another cpu will not exclusively lock the
>> memory of the lock in a loop.  If the lock is unlocked by another cpu
>> it is observed by the normal instruction and the lock is acquired
>> with a compare-and-swap instruction.
>
> I don't want to have more arch-specific code than we really need.
> Please compare what you have against the generic spinlock code.  If you
> find the generic spinlock code lacking, then please propose a change for
> it in a way that does not make things worse for other architectures.
>
> If there are arch-specific properties that significantly affect the
> approach the generic spinlock takes (eg, assumptions about CAS vs
> atomic-exchange runtime overheads), then please split them out.
>
> In the long term, this will also benefit s390.  For example, the
> spinlock code you have has no backoff, and introduces spinning with
> loads in a pretty ugly way through the hack you have added to the CAS
> (first loading the lock's value and comparing it manually if the
> supplied argument is a constant).
> While the generic spinlock code we have is also not very great,
> improving it is what will make life easier for the whole glibc project.
> Also, if someone else improves the generic spinlock code, s390 would
> miss out on this if you have added your custom spinlock code.
>

I had a look into the assembler output of generic spinlock code, e.g:
int
pthread_spin_trylock (pthread_spinlock_t *lock)
{
   return atomic_exchange_acq (lock, 1) ? EBUSY : 0;
}

On s390x assembler output, a new stack-frame is generated, the lock 
value is loaded, stored to stack, loaded from stack and then passed to 
cs-instruction.
After cs-instruction, the "old-value" is stored to stack, loaded from 
stack and then compared to zero.

I also had a look into the aarch64 pthread_spin_trylock.os compiled with 
build-many-script and a gcc 6.2.
aarch64 is using the __atomic-builtins for atomic_exchange_acq, 
atomic_compare_and_exchange_val_acq, ... .
The atomic_exchange_acq results in the exclusive load/store. Then the 
old-value is also stored to stack and is immediately loaded back in 
order to compare it against zero.

The type pthread_spinlock_t is a volatile int!
If lock is casted to (int *) before passing it to the atomic macros,
the assembler output looks okay.

If I change the current generic spinlock code, do I have to rewrite it 
to the c11-like macros?

I've tested the following function in advance:
int
foo (pthread_spinlock_t *lock)
{
   return atomic_load_acquire (lock);
}

With the __atomic_load_n version of atomic_load_acquire, the assembler 
output contains a load which is returned as expected.

The non-c11-macro results in the following:
    0:   58 10 20 00             l       %r1,0(%r2)
    4:   b3 c1 00 0f             ldgr    %f0,%r15
    8:   e3 f0 ff 58 ff 71       lay     %r15,-168(%r15)
    e:   50 10 f0 a4             st      %r1,164(%r15)
   12:   58 10 f0 a4             l       %r1,164(%r15)
   16:   50 10 f0 a0             st      %r1,160(%r15)
   1a:   58 20 f0 a0             l       %r2,160(%r15)
   1e:   b3 cd 00 f0             lgdr    %r15,%f0
   22:   b9 14 00 22             lgfr    %r2,%r2
   26:   07 fe                   br      %r14
The extra stores/loads to/from stack result from the "__typeof (*(mem)) 
__atg101_val" usages in atomic_load_* macros in conjunction with volatile.

As this behaviour is not s390 specific, I've grep'ed to see which arches 
use the generic spin-lock code and the c11-like-atomic-macros:
sysdeps/hppa/nptl/pthread_spin_lock.c:18:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/hppa/atomic-machine.h:40:#define 
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/microblaze/nptl/pthread_spin_lock.c:19:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/microblaze/atomic-machine.h:39:#define 
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/aarch64/nptl/pthread_spin_lock.c:19:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/aarch64/atomic-machine.h:40:#define USE_ATOMIC_COMPILER_BUILTINS 1

sysdeps/mips/nptl/pthread_spin_lock.c:18:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/mips/atomic-machine.h:95:#define USE_ATOMIC_COMPILER_BUILTINS 1
sysdeps/mips/atomic-machine.h:216:#define USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/nios2/nptl/pthread_spin_lock.c:19:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/nios2/atomic-machine.h:35:#define 
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/arm/nptl/pthread_spin_lock.c:18:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/arm/atomic-machine.h:37:#define USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/m68k/nptl/pthread_spin_lock.c:19:#define 
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/m68k/m680x0/m68020/atomic-machine.h:48:#define 
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/m68k/coldfire/atomic-machine.h:54:#define 
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h:40:#define 
USE_ATOMIC_COMPILER_BUILTINS 0

What is your suggestion, how to proceed with the volatile int type in 
conjunction with the atomic-macros?

Bye Stefan

next prev parent reply	other threads:[~2016-12-09 15:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23 15:09 [PATCH 1/2] S390: Optimize atomic macros Stefan Liebler
2016-11-23 15:09 ` [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock Stefan Liebler
2016-11-24 13:04   ` Torvald Riegel
2016-12-09 15:17     ` Stefan Liebler [this message]
2016-12-16 17:08       ` Stefan Liebler
2016-12-16 19:21         ` Torvald Riegel
2016-12-16 20:12       ` Florian Weimer
2016-11-24 12:51 ` [PATCH 1/2] S390: Optimize atomic macros Torvald Riegel
2016-12-09 15:17   ` Stefan Liebler
2016-12-16 17:08     ` Stefan Liebler
2016-12-16 19:15       ` Torvald Riegel
2017-02-27 11:36         ` Stefan Liebler
2017-02-28  7:33           ` Torvald Riegel
2017-03-06 14:42             ` Stefan Liebler
2016-12-16 19:18       ` Torvald Riegel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b493ee72-62f6-e630-1dfa-5a2f678cf481@linux.vnet.ibm.com \
    --to=stli@linux.vnet.ibm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).