public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Stefan Liebler <stli@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Cc: Torvald Riegel <triegel@redhat.com>
Subject: Re: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.
Date: Wed, 29 Mar 2017 14:16:00 -0000	[thread overview]
Message-ID: <8a4e784b-9ffa-9c73-2c39-550991235557@linux.vnet.ibm.com> (raw)
In-Reply-To: <b57d3477-a041-7b06-82ac-6d2b6c6bb08c@linux.vnet.ibm.com>

On 03/14/2017 04:55 PM, Stefan Liebler wrote:
> On 02/20/2017 02:51 PM, Torvald Riegel wrote:
>> On Mon, 2017-02-20 at 13:15 +0100, Stefan Liebler wrote:
>>>>>> A while ago we tried hard to remove all code that would fail silently
>>>>>> when a macro had a typo in the name, for example.  We still have a
>>>>>> few
>>>>>> of them left (e.g., in the atomics), but I think this here doesn't
>>>>>> warrant an exception.
>>>>>>
>>>>> Okay. You're right.
>>>>>
>>>>> In comment of patch 2/2, you've mentioned a header where an
>>>>> architecture
>>>>> shall define those parameters.
>>>>> Thus I've added the file nptl/pthread_spin_parameters.h.
>>>>
>>>> I think the stub version should be in sysdeps/ but the wiki is missing
>>>> this information (default ENOTSUP location):
>>>> https://sourceware.org/glibc/wiki/Style_and_Conventions#Proper_sysdeps_Location
>>>>
>>> In case we need a new header, shall we move it to sysdeps/nptl/ ?
>>
>> I would guess that this would be the right place for a stub / generic
>> variant of such a header, but I'm not quite sure.
>>
>>>>
>>>> I'm still a bit undecided about how to deal with other arch's
>>>> pthread_spin_lock.c files that just set SPIN_LOCK_READS_BETWEEN_CMPXCHG
>>>> and then include the generic pthread_spin_lock.c.  Maybe it's best
>>>> if we
>>>> just remove them in this patch of yours.
>>>>
>>> I think the archs currently using the generic implementation have just
>>> copied the default value to get rid of the warning "machine-dependent
>>> file should define SPIN_LOCK_READS_BETWEEN_CMPXCHG". But this is only an
>>> assumption.
>>> In general removing those "wrapper"-pthread_spin_lock.c files and using
>>> information from a header like the proposed pthread_spin_parameters.h or
>>> atomic-machine.h is a good approach.
>>
>> Okay.  So let's do that.
>>
> Done. The wrapper-pthread_spin_lock.c files in sysdeps subfolders are
> removed and the option SPIN_LOCK_READS_BETWEEN_CMPXCHG is not available
> anymore.
> Instead there is only a loop of plain reads until we observe an not
> acquired lock.
> See comment in nptl/pthread_spin_lock.c.
>
>>>> I agree that the there are architecture-specific properties of atomic
>>>> operations that have a significant effect on performance.  For s390,
>>>> you
>>>> list the following points:
>>>> * atomic exchange (I suppose all atomic read-modify-write ops?) are
>>>> implemented through CAS.
>>> atomic exchange is implemented by a CAS loop due to lack of an exchange
>>> instruction. For exchanging to a "0", s390 can use the load-and-and
>>> instruction instead of a CAS loop (See my atomic-machine.h patch; For
>>> gcc, we plan to emit such a load-and-and instruction for builtin
>>> __atomic_exchange_n in future;). This also saves one register usage
>>> compared to the CAS loop.
>>> Exchanging to "0" is e.g. used for lll_unlock macro or in
>>> malloc_consolidate.
>>
>> I guess this may then be yet another property atomic-machine.h could
>> expose: Whether an atomic fetch-and-and exists and is not implemented
>> throough a CAS loop.  Something like lll_unlock would then do an
>> exchange or fetch-and-and, and if that fails, a CAS loop.
>>
>>> Starting with z196 zarch CPUs, the following instructions which are used
>>> by the appropriate __atomic_fetch_xyz builtins instead of a CAS loop:
>>> load-and-add, load-and-and, load-and-exclusive-or, load-and-or.
>>> As information:
>>> I will update my atomic-machine.h patch to use at least some of the C11
>>> atomic builtins or all depending on gcc version.
>>
>> Thanks.
>>
>>>> These properties are specific to the architecture, but they are not
>>>> specific to the synchronization code (eg, to spinlocks).  Thus, I think
>>>> such settings should be in the atomics headers (i.e., in
>>>> atomic-machine.h).
>>>> This should probably be a separate patch.  I would propose the
>>>> following
>>>> macros (both are bool):
>>>> /* Nonzero iff all atomic read-modify-write operations (e.g., atomic
>>>> exchange) are implemented using a CAS loop.  */
>>>> #define ATOMIC_RMW_USES_CAS 1
>>> Is one macro enough to describe all read-modify-write operations in
>>> include/atomic.h?
>>> Please also see my comment above.
>>
>> Yes, it may not be enough (e.g., you say that s390 may have fetch_and
>> but not an exchange, but other archs may just have an exchange but no
>> custom fetch_and).
>> So maybe we should add ATOMIC_EXCHANGE_USES_CAS and
>> ATOMIC_FETCH_AND_USES_CAS for now, and add further ones on demand?
>>
>>>> /* Nonzero iff CAS always assumes it will store, and thus has cache
>>>> performance effects similar to a store (e.g., it always puts the
>>>> respective cacheline into an exclusive state, even if the comparison
>>>> against the expected value fails).  */
>>>> ATOMIC_CAS_ALWAYS_PREPARES_FOR_STORE 1
>>>>
>>>> I'm not sure whether there are architectures for which the second macro
>>>> would not be true.  It would be good to investigate this, maybe we
>>>> don't
>>>> need to add this at all.
>>>>
>>> We plan that in future gcc will emit e.g. a load-and-test instruction in
>>> front of the CAS instruction if the old-value is a constant zero.
>>
>> That can be useful, but what I outlined would be about a more generic
>> case.  If you are going to solve this for your arch in gcc, you might
>> not need to address this in this patch.
>>
>>>>
>>>>
>>>> For the spin lock specifically, we have the following possibilities:
>>>>
>>>> 1) If we assume that trylock will most likely succeed in practice:
>>>> * We just do an exchange.  The properties above don't matter.
>>>>
>>>> 2) If we want to bias towards cases where trylock succeeds, but don't
>>>> rule out contention:
>>>> * If exchange not a CAS loop, and exchange is faster than CAS, do an
>>>> exchange.
>>>> * If exchange is a CAS loop, use a weak CAS and not an exchange so we
>>>> bail out after the first failed attempt to change the state.
>>>>
>>>> 3) If we expect contention to be likely:
>>>> * If CAS always brings the cache line into an exclusive state, then
>>>> load
>>>> first.  Then do 2).
>>>>
>>>> Do we have consensus yet on whether we assume 2 or 3 (I don't think 1
>>>> buys us anything over 2)?
>>>>
>>> Yes. Sounds good.
>>
>> I read this as stating that you agree that 1) doesn't need to be
>> considered.  But we still to choose between 2) and 3)  :)
>>
>> I'm not quite sure what to favor because some of the code out there that
>> uses trylock just uses it to prevent deadlock (e.g., when trying to
>> acquire multiple locks at once), whereas other code uses spin_trylock to
>> implement their own back-off and bounded spinning.
>> I lean towards assuming that lock acquisitons, including the former use
>> case for trylock, are supposed to succeed in the common case.  That is,
>> I would pick 2), but I have no data to back this up.
>>
>> Either way, whatever we choose, we should document polish the cases 1-3
>> above and the reasoning behind our choice for one of them, and add it as
>> a comment to the spinlock code.
>>
>
> Okay. I've attached an updated patch. It is now using case 2).
> This choice applies to pthread_spin_trylock.c and the first attempt to
> acquire the lock in pthread_spin_lock.c.
> Therefore I've introduced ATOMIC_EXCHANGE_USES_CAS for all architectures
> in atomic-machine.h files. There is a check in include/atomic.h which
> ensures that it is defined to either 0 or 1. Can you please review the
> setting of 0 or 1?
>
> Bye Stefan


I link the post of Torvald Riegel in this spinlock mailthread, so we 
have all the discussions here. See
"Re: [PATCH v3 0/6] Add support for ISO C11 threads.h"
(https://www.sourceware.org/ml/libc-alpha/2017-03/msg00615.html):

Current DR log: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2109.htm

470:
C11's requirements on implementations are weaker than what POSIX
requires. Arguably, POSIX should require what C11 requires.
We need to check the lowlevellock implementation, in particular on archs
that use LL/SC (I'm not aware of an arch with a true CAS instruction
that can fail spuriously).
Our generic lowlevellock implementation on master still uses the
old-style atomics (ie, atomic_compare_and_exchange_bool_acq); if we move
to C11 atomics, C11 (with DR 470's proposed corrigendum applied) would
require our atomic_compare_exchange_weak_acquire, whereas current POSIX
would require the strong variant of the CAS.
Thus, what we do is okay in terms of C11.
I don't recall right now whether we have an open Austin Group bug about
POSIX allowing a spurious failure; I believe we have, or we should,
because this is also related to what memory order is required for lock
acquisitions (generally, not just trylock).
(Stefan, that's why I'm CC'ing you too, this is relevant for
pthread_spin_trylock too, but I can't remember right now whether we
already discussed this or not.)

Thus you mean we have to use a "atomic_compare_exchange_strong_acquire" 
in pthread_spin_trylock due to POSIX?

I've found your "Bug 17428 - lll_trylock barrier semantics when lock was 
not acquired differ between architectures" 
(https://sourceware.org/bugzilla/show_bug.cgi?id=17428)
and an older post: "Re: [PATCHv2] powerpc: Spinlock optimization and 
cleanup" (https://sourceware.org/ml/libc-alpha/2015-10/msg00012.html).

Bye
Stefan

  parent reply	other threads:[~2017-03-29 14:16 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16 16:32 Stefan Liebler
2016-12-16 16:32 ` [PATCH 2/2] S390: Use generic spinlock code Stefan Liebler
2017-02-08 14:49   ` Stefan Liebler
2017-02-13 20:39     ` Torvald Riegel
2017-02-15 16:26       ` Stefan Liebler
2017-02-18 17:05         ` Torvald Riegel
2017-03-14 15:55           ` Stefan Liebler
2017-03-21 15:43             ` Stefan Liebler
2017-04-06 12:27             ` Torvald Riegel
2016-12-19 12:14 ` [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros Szabolcs Nagy
2017-02-08 14:49   ` Stefan Liebler
2017-02-13 20:29     ` Torvald Riegel
2017-02-15  9:36       ` Stefan Liebler
2017-02-18 16:57         ` Torvald Riegel
2017-02-19  9:20           ` Florian Weimer
2017-02-20 13:11             ` Torvald Riegel
2017-02-26  7:55               ` Florian Weimer
2017-02-26 20:06                 ` Torvald Riegel
2017-02-26 20:29                   ` Florian Weimer
2017-02-26 20:35                     ` Torvald Riegel
2017-02-27 17:57                       ` Szabolcs Nagy
2017-02-28  7:15                         ` Torvald Riegel
2017-03-14 15:55                           ` Stefan Liebler
2017-02-20 12:15           ` Stefan Liebler
2017-02-20 13:51             ` Torvald Riegel
2017-03-14 15:55               ` Stefan Liebler
2017-03-21 15:43                 ` Stefan Liebler
2017-03-22 12:56                   ` Szabolcs Nagy
2017-03-23 16:16                     ` Stefan Liebler
2017-03-23 17:52                       ` Szabolcs Nagy
2017-04-06 12:04                     ` Torvald Riegel
2017-03-27 13:08                   ` Stefan Liebler
2017-04-04 10:29                     ` [PING] " Stefan Liebler
2017-03-29 14:16                 ` Stefan Liebler [this message]
2017-04-06 14:00                 ` Torvald Riegel
2017-04-07 16:23                   ` Stefan Liebler
2017-04-09 13:51                     ` Torvald Riegel
2017-04-10 12:00                       ` Stefan Liebler
2017-04-18 13:09                         ` Stefan Liebler
2017-04-25  6:47                           ` Stefan Liebler
2017-05-03 11:38                             ` [PATCH 1/2] [PING] " Stefan Liebler
2017-05-10 13:00                               ` Stefan Liebler
2017-05-17 13:09                                 ` Stefan Liebler
2017-05-24  6:37                                   ` Stefan Liebler
2017-05-30  7:18                                 ` Torvald Riegel
2017-05-31  8:29                                   ` Stefan Liebler
2017-05-31 16:48                                     ` Torvald Riegel
2017-06-01 13:40                                       ` Joseph Myers
2017-06-01 14:33                                         ` Torvald Riegel
2017-06-06  7:51                                       ` [PATCH 1/2] [COMMITTED] " Stefan Liebler
2017-04-10  8:17                     ` [PATCH 1/2] " Andreas Schwab
2017-04-10 12:00                       ` Stefan Liebler
2017-04-10 13:36                         ` Andreas Schwab
2017-04-11  7:06                           ` Stefan Liebler
2017-04-11  8:45                             ` Andreas Schwab
2017-04-11 10:15                               ` Stefan Liebler
2017-04-11 12:05                                 ` Andreas Schwab
2017-04-11 12:19                                   ` Stefan Liebler
2017-04-11 13:08                                   ` Zack Weinberg
2017-04-13 16:36                             ` Torvald Riegel
2017-05-30 21:00                     ` Tulio Magno Quites Machado Filho
2017-04-18 21:17                   ` Joseph Myers
2017-04-19  8:27                     ` Stefan Liebler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a4e784b-9ffa-9c73-2c39-550991235557@linux.vnet.ibm.com \
    --to=stli@linux.vnet.ibm.com \
    --cc=libc-alpha@sourceware.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).