public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Optimize CAS [BZ #28537]
@ 2021-11-11 16:24 H.J. Lu
  2021-11-11 16:24 ` [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK " H.J. Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: H.J. Lu @ 2021-11-11 16:24 UTC (permalink / raw)
  To: libc-alpha
  Cc: Florian Weimer, Oleh Derevenko, Arjan van de Ven, Andreas Schwab,
	Paul A . Clarke, Noah Goldstein

Changes in v6:

1. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail.
2. Remove low level lock changes.
3. Don't change CAS usages in __pthread_mutex_lock_full.
4. Avoid extra load with CAS in __pthread_mutex_clocklock_common.
5. Reduce CAS in malloc spinlocks.

Changes in v5:

1. Put back __glibc_unlikely in  __lll_trylock and lll_cond_trylock.
2. Remove an atomic load in a CAS usage which has been already optimized.
3. Add an empty statement with a semicolon to a goto label for older
compiler versions.
4. Simplify CAS optimization.

CAS instruction is expensive.  From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading.  See Appendix
A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.

Optimize CAS in low level locks and pthread_mutex_lock.c:

1. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
2. Replace boolean CAS with value CAS to avoid the extra load.
2. Change malloc spinlocks to do an atomic load and check if compare may
fail.  Skip CAS and spin if compare may fail to reduce cache line bouncing
on contended locks.

With all CAS optimizations applied, on a machine with 112 cores,

              mutex-empty    17.4575    17.3908  0.38%
             mutex-filler    48.4768    46.4925  4.1%
      mutex_trylock-empty    19.2726    19.2737  -0.0057%
     mutex_trylock-filler    54.0893     54.105  -0.029%
        rwlock_read-empty    39.7572    39.8933  -0.34%
       rwlock_read-filler     75.109    74.0818  1.4%
     rwlock_tryread-empty    5.28944    5.28938  0.0011%
    rwlock_tryread-filler    39.6297     39.734  -0.26%
       rwlock_write-empty    60.6644    60.6151  0.081%
      rwlock_write-filler      92.92    90.0722  3.1%
    rwlock_trywrite-empty    7.24741    6.59308  9%
   rwlock_trywrite-filler    42.7404    41.6767  2.5%
          spin_lock-empty    19.1078    19.1079  -0.00052%
         spin_lock-filler    51.0646    51.6041  -1.1%
       spin_trylock-empty    16.4707    16.4811  -0.063%
      spin_trylock-filler    50.5355    50.4012  0.27%
           sem_wait-empty    42.1991    42.1683  0.073%
          sem_wait-filler    74.6699    74.7883  -0.16%
        sem_trywait-empty    5.27062     5.2702  0.008%
       sem_trywait-filler    40.1541    40.1684  -0.036%
            condvar-empty    5488.91    5165.95  5.9%
           condvar-filler    1442.43    1474.21  -2.2%
  consumer_producer-empty    16508.2    16705.3  -1.2%
 consumer_producer-filler    16781.1    16942.3  -0.96%

H.J. Lu (4):
  Add LLL_MUTEX_READ_LOCK [BZ #28537]
  Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
  Reduce CAS in malloc spinlocks
  Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ
    #28537]

 malloc/arena.c                 |  5 +++++
 malloc/malloc.c                | 10 ++++++++++
 nptl/pthread_mutex_lock.c      | 17 ++++++++++++-----
 nptl/pthread_mutex_timedlock.c | 10 +++++-----
 4 files changed, 32 insertions(+), 10 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-02-23  5:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 16:24 [PATCH v6 0/4] Optimize CAS [BZ #28537] H.J. Lu
2021-11-11 16:24 ` [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK " H.J. Lu
2021-11-12 17:23   ` Szabolcs Nagy
2021-11-17  2:24   ` Noah Goldstein
2021-11-17 23:54     ` H.J. Lu
2021-11-18  0:03       ` Noah Goldstein
2021-11-18  0:31         ` H.J. Lu
2021-11-18  1:16           ` Arjan van de Ven
2022-09-11 20:19             ` Sunil Pandey
2022-09-29  0:10               ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 2/4] Avoid extra load with CAS in __pthread_mutex_lock_full " H.J. Lu
2021-11-12 16:31   ` Szabolcs Nagy
2021-11-12 18:50   ` Andreas Schwab
2022-09-11 20:16     ` Sunil Pandey
2022-09-29  0:10       ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 3/4] Reduce CAS in malloc spinlocks H.J. Lu
2023-02-23  5:48   ` DJ Delorie
2021-11-11 16:24 ` [PATCH v6 4/4] Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] H.J. Lu
2021-11-12 16:32   ` Szabolcs Nagy
2021-11-12 18:51   ` Andreas Schwab
2022-09-11 20:12     ` Sunil Pandey
2022-09-11 20:15       ` Arjan van de Ven
2022-09-11 21:26         ` Florian Weimer
2022-09-29  0:09       ` Noah Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).