From: kemi <kemi.wang@intel.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
Florian Weimer <fweimer@redhat.com>, Rical Jason <rj@2c3t.io>,
Carlos Donell <carlos@redhat.com>,
Glibc alpha <libc-alpha@sourceware.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Tim Chen <tim.c.chen@intel.com>,
Andi Kleen <andi.kleen@intel.com>,
Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
Lu Aubrey <aubrey.li@intel.com>
Subject: Re: Re: [PATCH v6 3/3] Mutex: Replace trylock by read only while spinning
Date: Wed, 04 Jul 2018 04:54:00 -0000 [thread overview]
Message-ID: <e9a879ea-b42a-d389-9f57-9e429c8a08ea@intel.com> (raw)
In-Reply-To: <CAMe9rOrUNrKO8JHXCZNGGix2QVyO=r74h4XMYdop-iAw52VbYg@mail.gmail.com>
On 2018å¹´07æ03æ¥ 21:16, H.J. Lu wrote:
> On Tue, Jul 3, 2018 at 5:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Jul 2, 2018 at 1:27 AM, Kemi Wang <kemi.wang@intel.com> wrote:
>>> The pthread adaptive spin mutex spins on the lock for a while before
>>> calling into the kernel to block. But, in the current implementation of
>>> spinning, the spinners go straight back to LLL_MUTEX_TRYLOCK(cmpxchg) when
>>> the lock is contended, it is not a good idea on many targets as that will
>>> force expensive memory synchronization among processors and penalize other
>>> running threads. For example, it constantly floods the system with "read
>>> for ownership" requests, which are much more expensive to process than a
>>> single read. Thus, we only use MO read until we observe the lock to not be
>>> acquired anymore, as suggested by Andi Kleen.
>>>
>>> Performance impact:
>>> Significant mutex performance improvement is not expected for this patch,
>>> though, it probably bring some benefit for the scenarios with severe lock
>>> contention on many architectures, the whole system performance can benefit
>>> from this modification because a number of unnecessary "read for ownership"
>>> requests which stress the cache system by broadcasting cache line
>>> invalidity are eliminated during spinning.
>>> Meanwhile, it may have some tiny performance regression on the lock holder
>>> transformation for the case of lock acquisition via spinning gets, because
>>> the lock state is checked before acquiring the lock via trylock.
>>>
>>> Similar mechanism has been implemented for pthread spin lock.
>>>
>>> Test machine:
>>> 2-sockets Skylake platform, 112 cores with 62G RAM
>>>
>>> Test case: mutex-adaptive-thread (Contended pthread adaptive spin mutex
>>> with global update)
>>> Usage: make bench BENCHSET=mutex-adaptive-thread
>>> Test result:
>>> +----------------+-----------------+-----------------+------------+
>>> | Configuration | Base | Head | % Change |
>>> | | Total iteration | Total iteration | base->head |
>>> +----------------+-----------------+-----------------+------------+
>>> | | Critical section size: 1x |
>>> +----------------+------------------------------------------------+
>>> |1 thread | 2.76681e+07 | 2.7965e+07 | +1.1% |
>>> |2 threads | 3.29905e+07 | 3.55279e+07 | +7.7% |
>>> |3 threads | 4.38102e+07 | 3.98567e+07 | -9.0% |
>>> |4 threads | 1.72172e+07 | 2.09498e+07 | +21.7% |
>>> |28 threads | 1.03732e+07 | 1.05133e+07 | +1.4% |
>>> |56 threads | 1.06308e+07 | 5.06522e+07 | +14.6% |
>>> |112 threads | 8.55177e+06 | 1.02954e+07 | +20.4% |
>>> +----------------+------------------------------------------------+
>>> | | Critical section size: 10x |
>>> +----------------+------------------------------------------------+
>>> |1 thread | 1.57006e+07 | 1.54727e+07 | -1.5% |
>>> |2 threads | 1.8044e+07 | 1.75601e+07 | -2.7% |
>>> |3 threads | 1.35634e+07 | 1.46384e+07 | +7.9% |
>>> |4 threads | 1.21257e+07 | 1.32046e+07 | +8.9% |
>>> |28 threads | 8.09593e+06 | 1.02713e+07 | +26.9% |
>>> |56 threads | 9.09907e+06 | 4.16203e+07 | +16.4% |
>>> |112 threads | 7.09731e+06 | 8.62406e+06 | +21.5% |
>>> +----------------+------------------------------------------------+
>>> | | Critical section size: 100x |
>>> +----------------+------------------------------------------------+
>>> |1 thread | 2.87116e+06 | 2.89188e+06 | +0.7% |
>>> |2 threads | 2.23409e+06 | 2.24216e+06 | +0.4% |
>>> |3 threads | 2.29888e+06 | 2.29964e+06 | +0.0% |
>>> |4 threads | 2.26898e+06 | 2.21394e+06 | -2.4% |
>>> |28 threads | 1.03228e+06 | 1.0051e+06 | -2.6% |
>>> |56 threads | 1.02953 +06 | 1.6344e+07 | -2.3% |
>>> |112 threads | 1.01615e+06 | 1.00134e+06 | -1.5% |
>>> +----------------+------------------------------------------------+
>>> | | Critical section size: 1000x |
>>> +----------------+------------------------------------------------+
>>> |1 thread | 316392 | 315635 | -0.2% |
>>> |2 threads | 302806 | 303469 | +0.2% |
>>> |3 threads | 298506 | 294281 | -1.4% |
>>> |4 threads | 292037 | 289945 | -0.7% |
>>> |28 threads | 155188 | 155250 | +0.0% |
>>> |56 threads | 190657 | 183106 | -4.0% |
>>> |112 threads | 210818 | 220342 | +4.5% |
>>> +----------------+-----------------+-----------------+------------+
>>>
>>> * nptl/pthread_mutex_lock.c: Optimize adaptive spin mutex
>>> * nptl/pthread_mutex_timedlock.c: Likewise
>>> * sysdeps/unix/sysv/linux/x86/pthread_mutex_lock.c: Enable read only
>>> while spinning for x86 architecture
>>> * sysdeps/unix/sysv/linux/x86/pthread_mutex_cond_lock.c: Likewise
>>> * sysdeps/unix/sysv/linux/x86/pthread_mutex_timedlock.c: Likewise
>>>
>>> ChangLog:
>>> V5->V6:
>>> no change
>>>
>>> V4->V5:
>>> a) Make the optimization work for pthread mutex_timedlock() in x86
>>> architecture.
>>> b) Move the READ_ONLY_SPIN macro definition from this patch to the
>>> first patch which adds glibc.mutex.spin_count tunable entry
>>>
>>> V3->V4:
>>> a) Make the optimization opt-in, and enable for x86 architecture as
>>> default, as suggested by Florian Weimer.
>>>
>>> V2->V3:
>>> a) Drop the idea of blocking spinners if fail to acquire a lock, since
>>> this idea would not be an universal winner. E.g. several threads
>>> contend for a lock which protects a small critical section, thus,
>>> probably any thread can acquire the lock via spinning.
>>> b) Fix the format issue AFAIC
>>>
>>> V1->V2: fix format issue
>>>
>>> Suggested-by: Andi Kleen <andi.kleen@intel.com>
>>> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
>>> ---
>>> nptl/pthread_mutex_lock.c | 15 +++++++++++++++
>>> nptl/pthread_mutex_timedlock.c | 15 +++++++++++++++
>>> sysdeps/unix/sysv/linux/x86/pthread_mutex_cond_lock.c | 1 +
>>> sysdeps/unix/sysv/linux/x86/pthread_mutex_lock.c | 1 +
>>> sysdeps/unix/sysv/linux/x86/pthread_mutex_timedlock.c | 1 +
>>> 5 files changed, 33 insertions(+)
>>>
>>> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
>>> index 1519c14..9a7b5c2 100644
>>> --- a/nptl/pthread_mutex_lock.c
>>> +++ b/nptl/pthread_mutex_lock.c
>>> @@ -133,7 +139,16 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>>> LLL_MUTEX_LOCK (mutex);
>>> break;
>>> }
>>> +#ifdef READ_ONLY_SPIN
>>> + do
>>> + {
>>> + atomic_spin_nop ();
>>> + val = atomic_load_relaxed (&mutex->__data.__lock);
>>> + }
>>> + while (val != 0 && ++cnt < max_cnt);
>>> +#else
>>> atomic_spin_nop ();
>>> +#endif
>>> }
>>> while (LLL_MUTEX_TRYLOCK (mutex) != 0);
>>>
>>> diff --git a/nptl/pthread_mutex_timedlock.c b/nptl/pthread_mutex_timedlock.c
>>> index 66efd39..dcaeca8 100644
>>> --- a/nptl/pthread_mutex_timedlock.c
>>> +++ b/nptl/pthread_mutex_timedlock.c
>>> @@ -126,7 +132,16 @@ __pthread_mutex_timedlock (pthread_mutex_t *mutex,
>>> PTHREAD_MUTEX_PSHARED (mutex));
>>> break;
>>> }
>>> +#ifdef READ_ONLY_SPIN
>>> + do
>>> + {
>>> + atomic_spin_nop ();
>>> + val = atomic_load_relaxed (&mutex->__data.__lock);
>>> + }
>>> + while (val != 0 && ++cnt < max_cnt);
>>> +#else
>>> atomic_spin_nop ();
>>> +#endif
>>> }
>>> while (lll_trylock (mutex->__data.__lock) != 0);
>>>
>>
>> Please define generic
>>
>> static inline void __always_inline
>> atomic_spin_nop_with_limit (int cnt, int max_cnt, pthread_mutex_t *mutex)
>> {
>> atomic_spin_nop ();
>> }
>>
>> and x86 version:
>>
>> static inline void __always_inline
>> atomic_spin_nop_with_limit (int cnt, int max_cnt, pthread_mutex_t *mutex)
>> {
>> do
>> {
>> atomic_spin_nop ();
>> val = atomic_load_relaxed (&mutex->__data.__lock);
>> }
>> while (val != 0 && ++cnt < max_cnt);
>> }
>>
>> in a standalone patch.
>>
>
> Please take a look at
>
> https://github.com/hjl-tools/glibc/tree/hjl/spin/master
>
Reviewed. Thanks for refining.
> I added:
>
> static __always_inline void
> atomic_spin_lock (pthread_spinlock_t *lock, int cnt, int max_cnt)
> {
> int val = 0;
> do
> {
> atomic_spin_nop ();
> val = atomic_load_relaxed (lock);
> }
> while (val != 0 && ++cnt < max_cnt);
> }
>
>
next prev parent reply other threads:[~2018-07-04 4:54 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-02 8:31 [PATCH v6 1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex Kemi Wang
2018-07-02 8:31 ` [PATCH v6 2/3] benchtests: Add pthread adaptive spin mutex microbenchmark Kemi Wang
2018-07-02 8:31 ` [PATCH v6 3/3] Mutex: Replace trylock by read only while spinning Kemi Wang
2018-07-03 12:46 ` H.J. Lu
2018-07-03 13:17 ` H.J. Lu
2018-07-04 4:54 ` kemi [this message]
2018-07-03 12:32 ` [PATCH v6 1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex H.J. Lu
2018-07-03 13:40 ` H.J. Lu
2018-07-04 4:54 ` kemi
2018-07-04 5:55 ` Wang, Kemi
2018-07-04 13:16 ` H.J. Lu
2018-07-05 1:31 ` kemi
2018-07-05 2:16 ` H.J. Lu
2018-07-05 2:28 ` Wang, Kemi
2018-07-05 4:15 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9a879ea-b42a-d389-9f57-9e429c8a08ea@intel.com \
--to=kemi.wang@intel.com \
--cc=aaron.lu@intel.com \
--cc=adhemerval.zanella@linaro.org \
--cc=andi.kleen@intel.com \
--cc=aubrey.li@intel.com \
--cc=carlos@redhat.com \
--cc=dave.hansen@linux.intel.com \
--cc=fweimer@redhat.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=rj@2c3t.io \
--cc=tim.c.chen@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).