From: "H.J. Lu" <hjl.tools@gmail.com>
To: libc-alpha@sourceware.org
Cc: Florian Weimer <fweimer@redhat.com>,
Oleh Derevenko <oleh.derevenko@gmail.com>,
Arjan van de Ven <arjan@linux.intel.com>,
Andreas Schwab <schwab@linux-m68k.org>,
"Paul A . Clarke" <pc@us.ibm.com>,
Noah Goldstein <goldstein.w.n@gmail.com>
Subject: [PATCH v6 0/4] Optimize CAS [BZ #28537]
Date: Thu, 11 Nov 2021 08:24:24 -0800 [thread overview]
Message-ID: <20211111162428.2286605-1-hjl.tools@gmail.com> (raw)
Changes in v6:
1. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail.
2. Remove low level lock changes.
3. Don't change CAS usages in __pthread_mutex_lock_full.
4. Avoid extra load with CAS in __pthread_mutex_clocklock_common.
5. Reduce CAS in malloc spinlocks.
Changes in v5:
1. Put back __glibc_unlikely in __lll_trylock and lll_cond_trylock.
2. Remove an atomic load in a CAS usage which has been already optimized.
3. Add an empty statement with a semicolon to a goto label for older
compiler versions.
4. Simplify CAS optimization.
CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Optimize CAS in low level locks and pthread_mutex_lock.c:
1. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
2. Replace boolean CAS with value CAS to avoid the extra load.
2. Change malloc spinlocks to do an atomic load and check if compare may
fail. Skip CAS and spin if compare may fail to reduce cache line bouncing
on contended locks.
With all CAS optimizations applied, on a machine with 112 cores,
mutex-empty 17.4575 17.3908 0.38%
mutex-filler 48.4768 46.4925 4.1%
mutex_trylock-empty 19.2726 19.2737 -0.0057%
mutex_trylock-filler 54.0893 54.105 -0.029%
rwlock_read-empty 39.7572 39.8933 -0.34%
rwlock_read-filler 75.109 74.0818 1.4%
rwlock_tryread-empty 5.28944 5.28938 0.0011%
rwlock_tryread-filler 39.6297 39.734 -0.26%
rwlock_write-empty 60.6644 60.6151 0.081%
rwlock_write-filler 92.92 90.0722 3.1%
rwlock_trywrite-empty 7.24741 6.59308 9%
rwlock_trywrite-filler 42.7404 41.6767 2.5%
spin_lock-empty 19.1078 19.1079 -0.00052%
spin_lock-filler 51.0646 51.6041 -1.1%
spin_trylock-empty 16.4707 16.4811 -0.063%
spin_trylock-filler 50.5355 50.4012 0.27%
sem_wait-empty 42.1991 42.1683 0.073%
sem_wait-filler 74.6699 74.7883 -0.16%
sem_trywait-empty 5.27062 5.2702 0.008%
sem_trywait-filler 40.1541 40.1684 -0.036%
condvar-empty 5488.91 5165.95 5.9%
condvar-filler 1442.43 1474.21 -2.2%
consumer_producer-empty 16508.2 16705.3 -1.2%
consumer_producer-filler 16781.1 16942.3 -0.96%
H.J. Lu (4):
Add LLL_MUTEX_READ_LOCK [BZ #28537]
Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Reduce CAS in malloc spinlocks
Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ
#28537]
malloc/arena.c | 5 +++++
malloc/malloc.c | 10 ++++++++++
nptl/pthread_mutex_lock.c | 17 ++++++++++++-----
nptl/pthread_mutex_timedlock.c | 10 +++++-----
4 files changed, 32 insertions(+), 10 deletions(-)
--
2.33.1
next reply other threads:[~2021-11-11 16:24 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-11 16:24 H.J. Lu [this message]
2021-11-11 16:24 ` [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK " H.J. Lu
2021-11-12 17:23 ` Szabolcs Nagy
2021-11-17 2:24 ` Noah Goldstein
2021-11-17 23:54 ` H.J. Lu
2021-11-18 0:03 ` Noah Goldstein
2021-11-18 0:31 ` H.J. Lu
2021-11-18 1:16 ` Arjan van de Ven
2022-09-11 20:19 ` Sunil Pandey
2022-09-29 0:10 ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 2/4] Avoid extra load with CAS in __pthread_mutex_lock_full " H.J. Lu
2021-11-12 16:31 ` Szabolcs Nagy
2021-11-12 18:50 ` Andreas Schwab
2022-09-11 20:16 ` Sunil Pandey
2022-09-29 0:10 ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 3/4] Reduce CAS in malloc spinlocks H.J. Lu
2023-02-23 5:48 ` DJ Delorie
2021-11-11 16:24 ` [PATCH v6 4/4] Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] H.J. Lu
2021-11-12 16:32 ` Szabolcs Nagy
2021-11-12 18:51 ` Andreas Schwab
2022-09-11 20:12 ` Sunil Pandey
2022-09-11 20:15 ` Arjan van de Ven
2022-09-11 21:26 ` Florian Weimer
2022-09-29 0:09 ` Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211111162428.2286605-1-hjl.tools@gmail.com \
--to=hjl.tools@gmail.com \
--cc=arjan@linux.intel.com \
--cc=fweimer@redhat.com \
--cc=goldstein.w.n@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=oleh.derevenko@gmail.com \
--cc=pc@us.ibm.com \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).