public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537]
@ 2021-11-04 16:14 H.J. Lu
  2021-11-08 15:32 ` Noah Goldstein
  2021-11-08 17:36 ` Florian Weimer
  0 siblings, 2 replies; 4+ messages in thread
From: H.J. Lu @ 2021-11-04 16:14 UTC (permalink / raw)
  To: libc-alpha
  Cc: Florian Weimer, Oleh Derevenko, Arjan van de Ven, Andreas Schwab,
	Hongyu Wang, liuhongt

From the CPU's point of view, getting a cache line for writing is more
expensive than reading.  See Appendix A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.  Load the current memory value via a
volatile pointer first, which should be atomic and won't be optimized
out by compiler, check and return immediately if writing cache line may
fail to reduce cache line bouncing on contended locks.

This fixes BZ# 28537.

A GCC bug is opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065

The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK
to indicate that compiler will generate the check with the volatile load.
Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the
extra volatile load.
---
 sysdeps/x86/atomic-machine.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h
index 2692d94a92..597dc1cf92 100644
--- a/sysdeps/x86/atomic-machine.h
+++ b/sysdeps/x86/atomic-machine.h
@@ -73,9 +73,20 @@ typedef uintmax_t uatomic_max_t;
 #define ATOMIC_EXCHANGE_USES_CAS	0
 
 #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \
-  __sync_val_compare_and_swap (mem, oldval, newval)
+  ({ volatile __typeof (*(mem)) *memp = (mem);				\
+     __typeof (*(mem)) oldmem = *memp, ret;				\
+     ret = (oldmem == (oldval)						\
+	    ? __sync_val_compare_and_swap (mem, oldval, newval)		\
+	    : oldmem);							\
+     ret; })
 #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
-  (! __sync_bool_compare_and_swap (mem, oldval, newval))
+  ({ volatile __typeof (*(mem)) *memp = (mem);				\
+     __typeof (*(mem)) oldmem = *memp;					\
+     int ret;								\
+     ret = (oldmem == (oldval)						\
+	    ? !__sync_bool_compare_and_swap (mem, oldval, newval)	\
+	    : 1);							\
+     ret; })
 
 
 #define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \
-- 
2.33.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-08 17:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 16:14 [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] H.J. Lu
2021-11-08 15:32 ` Noah Goldstein
2021-11-08 15:45   ` H.J. Lu
2021-11-08 17:36 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).