From: "H.J. Lu" <hjl.tools@gmail.com>
To: "Paul A. Clarke" <pc@us.ibm.com>
Cc: Paul E Murphy <murphyp@linux.ibm.com>,
GNU C Library <libc-alpha@sourceware.org>,
Florian Weimer <fweimer@redhat.com>,
Andreas Schwab <schwab@linux-m68k.org>,
Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: [PATCH v4 0/3] Optimize CAS [BZ #28537]
Date: Wed, 10 Nov 2021 15:34:54 -0800 [thread overview]
Message-ID: <CAMe9rOpR4wNHOH07KY+JC8o_jqHb4Xspb-cP=Dyxn6+QycTN2Q@mail.gmail.com> (raw)
In-Reply-To: <20211110200722.GF4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com>
On Wed, Nov 10, 2021 at 12:07 PM Paul A. Clarke <pc@us.ibm.com> wrote:
>
> On Wed, Nov 10, 2021 at 08:26:09AM -0600, Paul E Murphy via Libc-alpha wrote:
> > On 11/9/21 6:16 PM, H.J. Lu via Libc-alpha wrote:
> > > CAS instruction is expensive. From the x86 CPU's point of view, getting
> > > a cache line for writing is more expensive than reading. See Appendix
> > > A.2 Spinlock in:
> > >
> > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
> > >
> > > The full compare and swap will grab the cache line exclusive and cause
> > > excessive cache line bouncing.
> > >
> > > Optimize CAS in low level locks and pthread_mutex_lock.c:
> > >
> > > 1. Do an atomic load and skip CAS if compare may fail to reduce cache
> > > line bouncing on contended locks.
> > > 2. Replace atomic_compare_and_exchange_bool_acq with
> > > atomic_compare_and_exchange_val_acq to avoid the extra load.
> > > 3. Drop __glibc_unlikely in __lll_trylock and lll_cond_trylock since we
> > > don't know if it's actually rare; in the contended case it is clearly not
> > > rare.
> >
> > Are you able to share benchmarks of this change? I am curious what effects
> > this might have on other platforms.
>
> I'd like to see the expected performance results, too.
>
> For me, the results are not uniformly positive (Power10).
> From bench-pthread-locks:
>
> bench bench-patched
> mutex-empty 4.73371 4.54792 3.9%
> mutex-filler 18.5395 18.3419 1.1%
> mutex_trylock-empty 10.46 2.46364 76.4%
> mutex_trylock-filler 16.2188 16.1758 0.3%
> rwlock_read-empty 16.5118 16.4681 0.3%
> rwlock_read-filler 20.68 20.4416 1.2%
> rwlock_tryread-empty 2.06572 2.17284 -5.2%
> rwlock_tryread-filler 16.082 16.1215 -0.2%
> rwlock_write-empty 31.3723 31.259 0.4%
> rwlock_write-filler 41.6492 69.313 -66.4%
> rwlock_trywrite-empty 2.20584 2.32178 -5.3%
> rwlock_trywrite-filler 15.7044 15.9088 -1.3%
> spin_lock-empty 16.7964 16.7731 0.1%
> spin_lock-filler 20.6118 20.4175 0.9%
> spin_trylock-empty 8.99989 8.98879 0.1%
> spin_trylock-filler 16.4732 15.9957 2.9%
> sem_wait-empty 15.805 15.7391 0.4%
> sem_wait-filler 19.2346 19.5098 -1.4%
> sem_trywait-empty 2.06405 2.03782 1.3%
> sem_trywait-filler 15.921 15.8408 0.5%
> condvar-empty 1385.84 1387.29 -0.1%
> condvar-filler 1419.82 1424.01 -0.3%
> consumer_producer-empty 2550.01 2395.29 6.1%
> consumer_producer-filler 2709.4 2558.28 5.6%
>
> PC
Here are the results on a machine with 112 cores:
mutex-empty 16.0112 16.5728 -3.5%
mutex-filler 49.4354 48.7608 1.4%
mutex_trylock-empty 19.2854 8.56795 56%
mutex_trylock-filler 54.9643 41.5418 24%
rwlock_read-empty 39.8855 39.7448 0.35%
rwlock_read-filler 75.1334 75.1218 0.015%
rwlock_tryread-empty 5.29094 5.2917 -0.014%
rwlock_tryread-filler 39.6653 40.209 -1.4%
rwlock_write-empty 60.6445 60.6236 0.034%
rwlock_write-filler 91.431 92.9016 -1.6%
rwlock_trywrite-empty 5.28404 5.94623 -13%
rwlock_trywrite-filler 40.7044 40.7709 -0.16%
spin_lock-empty 19.1067 19.1068 -0.00052%
spin_lock-filler 51.643 51.2963 0.67%
spin_trylock-empty 16.4705 16.4707 -0.0012%
spin_trylock-filler 45.4647 50.5047 -11%
sem_wait-empty 42.169 42.1889 -0.047%
sem_wait-filler 74.4302 74.4577 -0.037%
sem_trywait-empty 5.27318 5.27172 0.028%
sem_trywait-filler 40.191 40.8506 -1.6%
condvar-empty 5404.27 5406.39 -0.039%
condvar-filler 5022.93 1566.82 69%
consumer_producer-empty 15899.2 16755.8 -5.4%
consumer_producer-filler 16076.9 16065.8 0.069%
rwlock_trywrite-empty has 13% regression and spin_trylock-filler
has 11% regression. But there are 69%, 56% and 24% improvements.
--
H.J.
next prev parent reply other threads:[~2021-11-10 23:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-10 0:16 H.J. Lu
2021-11-10 0:16 ` [PATCH v4 1/3] Reduce CAS in low level locks " H.J. Lu
2021-11-10 1:56 ` Noah Goldstein
2021-11-10 0:16 ` [PATCH v4 2/3] Reduce CAS in __pthread_mutex_lock_full " H.J. Lu
2021-11-10 0:16 ` [PATCH v4 3/3] Optimize " H.J. Lu
2021-11-10 14:26 ` [PATCH v4 0/3] Optimize CAS " Paul E Murphy
2021-11-10 20:07 ` Paul A. Clarke
2021-11-10 21:33 ` H.J. Lu
2021-11-11 0:30 ` Paul A. Clarke
2021-11-10 23:34 ` H.J. Lu [this message]
2021-11-10 15:35 ` Paul A. Clarke
2021-11-10 15:42 ` H.J. Lu
2021-11-10 15:50 ` Paul A. Clarke
2021-11-10 15:52 ` H.J. Lu
2021-11-10 15:52 ` Florian Weimer
2021-11-10 16:03 ` H.J. Lu
2021-11-10 16:04 ` Paul A. Clarke
2021-11-10 16:14 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMe9rOpR4wNHOH07KY+JC8o_jqHb4Xspb-cP=Dyxn6+QycTN2Q@mail.gmail.com' \
--to=hjl.tools@gmail.com \
--cc=arjan@linux.intel.com \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=murphyp@linux.ibm.com \
--cc=pc@us.ibm.com \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).