[PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
@ 2023-11-16 13:45 Jonathan Wakely
  2023-11-16 13:45 ` [PATCH 2/2] libstdc++: Pass __wait_args to internal API by const pointer Jonathan Wakely
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jonathan Wakely @ 2023-11-16 13:45 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Thomas Rodgers, Thomas Rodgers

From: Thomas Rodgers <trodgers@redhat.com>

These two patches were written by Tom earlier this year, before he left
Red Hat. We should finish reviewing them for GCC 14 (and probably squash
them into one?)

Tom, you mentioned further work that changes the __platform_wait_t* to
uintptr_t, is that ready, or likely to be ready very soon?

Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.


-- >8 --

This represents a major refactoring of the previous atomic::wait
and atomic::notify implementation detail. The aim of this change
is to simplify the implementation details and position the resulting
implementation so that much of the current header-only detail
can be moved into the shared library, while also accounting for
anticipated changes to wait/notify functionality for C++26.

The previous implementation implemented spin logic in terms of
the types __default_spin_policy, __timed_backoff_spin_policy, and
the free function __atomic_spin. These are replaced in favor of
two new free functions; __spin_impl and __spin_until_impl. These
currently inline free functions are expected to be moved into the
libstdc++ shared library in a future commit.

The previous implementation derived untimed and timed wait
implementation detail from __detail::__waiter_pool_base. This
is-a relationship is removed in the new version and the previous
implementation detail is renamed to reflect this change. The
static _S_for member has been renamed as well to indicate that it
returns the __waiter_pool_impl entry in the static 'side table'
for a given awaited address.

This new implementation replaces all of the non-templated waiting
detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
__bare_wait with the __wait_impl free function, and the supporting
__wait_flags enum and __wait_args struct. This currenly inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This new implementation replaces all of the non-templated notifying
detail of __waiter_base, __waiter_pool, and __waiter with the
__notify_impl free function. This currently inline free function
is expected to be moved into the libstdc++ shared library in a
future commit.

The __atomic_wait_address template function is updated to account
for the above changes and to support the expected C++26 change to
pass the most recent observed value to the caller supplied predicate.

A new non-templated __atomic_wait_address_v free function is added
that only works for atomic types that operate only on __platform_wait_t
and requires the caller to supply a memory order. This is intended
to be the simplest code path for such types.

The __atomic_wait_address_v template function is now implemented in
terms of new __atomic_wait_address template and continues to accept
a user supplied "value function" to retrieve the current value of
the atomic.

The __atomic_notify_address template function is updated to account
for the above changes.

The template __platform_wait_until_impl is renamed to
__wait_clock_t. The previous __platform_wait_until template is deleted
and the functionality previously provided is moved t the new tempalate
function __wait_until. A similar change is made to the
__cond_wait_until_impl/__cond_wait_until implementation.

This new implementation similarly replaces all of the non-templated
waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
the new __wait_until_impl free function. This currently inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This implementation replaces all templated waiting functions that
manage clock conversion as well as relative waiting (wait_for) with
the new template functions __wait_until and __wait_for.

Similarly the previous implementation detail for the various
__atomic_wait_address_Xxx templates is adjusted to account for the
implementation changes outlined above.

All of the "bare wait" versions of __atomic_wait_Xxx have been removed
and replaced with a defaulted boolean __bare_wait parameter on the
new version of these templates.

libstdc++-v3/ChangeLog:

	* include/bits/atomic_timed_wait.h:
	(__detail::__platform_wait_until_impl): Rename to
	__platform_wait_until.
	(__detail::__platform_wait_until): Remove previous
	definition.
	(__detail::__cond_wait_until_impl): Rename to
	__cond_wait_until.
	(__detail::__cond_wait_until): Remove previous
	definition.
	(__detail::__spin_until_impl): New function.
	(__detail::__wait_until_impl): New function.
	(__detail::__wait_until): New function.
	(__detail::__wait_for): New function.
	(__detail::__timed_waiter_pool): Remove type.
	(__detail::__timed_backoff_spin_policy): Remove type.
	(__detail::__timed_waiter): Remove type.
	(__detail::__enters_timed_wait): Remove type alias.
	(__detail::__bare_timed_wait): Remove type alias.
	(__atomic_wait_address_until): Adjust to new implementation
	detail.
	(__atomic_wait_address_until_v): Likewise.
	(__atomic_wait_address_bare): Remove.
	(__atomic_wait_address_for): Adjust to new implementation
	detail.
	(__atomic_wait_address_for_v): Likewise.
	(__atomic_wait_address_for_bare): Remove.
	* include/bits/atomic_wait.h: Include bits/stl_pair.h.
	(__detail::__default_spin_policy): Remove type.
	(__detail::__atomic_spin): Remove function.
	(__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
	Remove _M_notify.  Rename _S_for to _S_impl_for.
	(__detail::__waiter_base): Remove type.
	(__detail::__waiter_pool): Remove type.
	(__detail::__waiter): Remove type.
	(__detail::__enters_wait): Remove type alias.
	(__detail::__bare_wait): Remove type alias.
	(__detail::__wait_flags): New enum.
	(__detail::__wait_args): New struct.
	(__detail::__wait_result_type): New type alias.
	(__detail::__spin_impl): New function.
	(__detail::__wait_impl): New function.
	(__atomic_wait_address): Adjust to new implementation detail.
	(__atomic_wait_address_v): Likewise.
	(__atomic_notify_address): Likewise.
	(__atomic_wait_address_bare): Delete.
	(__atomic_notify_address_bare): Likewise.
	* include/bits/semaphore_base.h: Adjust implementation to
	use new __atomic_wait_address_v contract.
	* include/std/barrier: Adjust implementation to use new
	__atomic_wait contract.
	* include/std/latch: Adjust implementation to use new
	__atomic_wait contract.
	* testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
	Adjust to for __detail::__waiter_pool_base renaming.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++----------
 libstdc++-v3/include/bits/atomic_wait.h       | 486 ++++++++--------
 libstdc++-v3/include/bits/semaphore_base.h    |  53 +-
 libstdc++-v3/include/std/barrier              |   6 +-
 libstdc++-v3/include/std/latch                |   5 +-
 .../29_atomics/atomic/wait_notify/100334.cc   |   4 +-
 6 files changed, 514 insertions(+), 589 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
index ba21ab20a11..90427ef8b42 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -74,62 +74,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
 #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
     // returns true if wait ended before timeout
-    template<typename _Dur>
-      bool
-      __platform_wait_until_impl(const __platform_wait_t* __addr,
-				 __platform_wait_t __old,
-				 const chrono::time_point<__wait_clock_t, _Dur>&
-				      __atime) noexcept
-      {
-	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
-	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
+    bool
+    __platform_wait_until(const __platform_wait_t* __addr,
+			  __platform_wait_t __old,
+			  const __wait_clock_t::time_point& __atime) noexcept
+    {
+      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
+      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
-	struct timespec __rt =
+      struct timespec __rt =
 	{
 	  static_cast<std::time_t>(__s.time_since_epoch().count()),
 	  static_cast<long>(__ns.count())
 	};
 
-	auto __e = syscall (SYS_futex, __addr,
-			    static_cast<int>(__futex_wait_flags::
-						__wait_bitset_private),
-			    __old, &__rt, nullptr,
-			    static_cast<int>(__futex_wait_flags::
-						__bitset_match_any));
-
-	if (__e)
-	  {
-	    if (errno == ETIMEDOUT)
-	      return false;
-	    if (errno != EINTR && errno != EAGAIN)
-	      __throw_system_error(errno);
-	  }
-	return true;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
-			    const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	if constexpr (is_same_v<__wait_clock_t, _Clock>)
-	  {
-	    return __platform_wait_until_impl(__addr, __old, __atime);
-	  }
-	else
-	  {
-	    if (!__platform_wait_until_impl(__addr, __old,
-					    __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
+      auto __e = syscall (SYS_futex, __addr,
+			  static_cast<int>(__futex_wait_flags::__wait_bitset_private),
+			  __old, &__rt, nullptr,
+			  static_cast<int>(__futex_wait_flags::__bitset_match_any));
+      if (__e)
+	{
+	  if (errno == ETIMEDOUT)
 	    return false;
-	  }
+	  if (errno != EINTR && errno != EAGAIN)
+	    __throw_system_error(errno);
+	}
+	return true;
       }
 #else
 // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until()
@@ -139,15 +109,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef _GLIBCXX_HAS_GTHREADS
     // Returns true if wait ended before timeout.
-    // _Clock must be either steady_clock or system_clock.
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
-			     const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	static_assert(std::__is_one_of<_Clock, chrono::steady_clock,
-					       chrono::system_clock>::value);
-
+    bool
+    __cond_wait_until(__condvar& __cv, mutex& __mx,
+		      const __wait_clock_t::time_point& __atime)
+    {
 	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
 	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
@@ -158,293 +123,261 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  };
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<chrono::steady_clock, _Clock>)
+	if constexpr (is_same_v<chrono::steady_clock, __wait_clock_t>)
 	  __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
 	else
 #endif
 	  __cv.wait_until(__mx, __ts);
-	return _Clock::now() < __atime;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<_Clock, chrono::steady_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-#endif
-	if constexpr (is_same_v<_Clock, chrono::system_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-	  {
-	    if (__cond_wait_until_impl(__cv, __mx,
-				       __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
-	    return false;
-	  }
+	return __wait_clock_t::now() < __atime;
       }
 #endif // _GLIBCXX_HAS_GTHREADS
 
-    struct __timed_waiter_pool : __waiter_pool_base
+    inline __wait_result_type
+    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __deadline)
     {
-      // returns true if wait ended before timeout
-      template<typename _Clock, typename _Dur>
-	bool
-	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
-			 const chrono::time_point<_Clock, _Dur>& __atime)
+      auto __t0 = __wait_clock_t::now();
+      using namespace literals::chrono_literals;
+
+      __platform_wait_t __val;
+      auto __now = __wait_clock_t::now();
+      for (; __now < __deadline; __now = __wait_clock_t::now())
+      {
+	auto __elapsed = __now - __t0;
+#ifndef _GLIBCXX_NO_SLEEP
+	if (__elapsed > 128ms)
+	{
+	  this_thread::sleep_for(64ms);
+	}
+	else if (__elapsed > 64us)
+	{
+	  this_thread::sleep_for(__elapsed / 2);
+	}
+	else
+#endif
+	if (__elapsed > 4us)
+	{
+	  __thread_yield();
+	}
+	else
+	{
+	  auto __res = __detail::__spin_impl(__addr, __args);
+	  if (__res.first)
+	    return __res;
+	}
+
+	__atomic_load(__addr, &__val, __args._M_order);
+	if (__val != __args._M_old)
+	    return make_pair(true, __val);
+      }
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __atime)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
 	{
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-	  return __platform_wait_until(__addr, __old, __atime);
-#else
-	  __platform_wait_t __val;
-	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	  if (__val == __old)
-	    {
-	      lock_guard<mutex> __l(_M_mtx);
-	      return __cond_wait_until(_M_cv, _M_mtx, __atime);
-	    }
-	  else
-	    return true;
-#endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
 	}
-    };
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
 
-    struct __timed_backoff_spin_policy
-    {
-      __wait_clock_t::time_point _M_deadline;
-      __wait_clock_t::time_point _M_t0;
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_until_impl(__wait_addr, __args, __atime);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
+	}
 
-      template<typename _Clock, typename _Dur>
-	__timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur>
-				      __deadline = _Clock::time_point::max(),
-				    chrono::time_point<_Clock, _Dur>
-				      __t0 = _Clock::now()) noexcept
-	  : _M_deadline(__to_wait_clock(__deadline))
-	  , _M_t0(__to_wait_clock(__t0))
-	{ }
-
-      bool
-      operator()() const noexcept
+      if (!(__args & __wait_flags::__track_contention))
       {
-	using namespace literals::chrono_literals;
-	auto __now = __wait_clock_t::now();
-	if (_M_deadline <= __now)
-	  return false;
-
-	// FIXME: this_thread::sleep_for not available #ifdef _GLIBCXX_NO_SLEEP
-
-	auto __elapsed = __now - _M_t0;
-	if (__elapsed > 128ms)
-	  {
-	    this_thread::sleep_for(64ms);
-	  }
-	else if (__elapsed > 64us)
-	  {
-	    this_thread::sleep_for(__elapsed / 2);
-	  }
-	else if (__elapsed > 4us)
-	  {
-	    __thread_yield();
-	  }
-	else
-	  return false;
-	return true;
+	// caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	__pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				     : __pool;
+#endif
+	__pool->_M_enter_wait();
       }
-    };
 
-    template<typename _EntersWait>
-      struct __timed_waiter : __waiter_base<__timed_waiter_pool>
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      if (__platform_wait_until(__wait_addr, __args._M_old, __atime))
+	__res = make_pair(true, __args._M_old);
+      else
+	__res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	   lock_guard<mutex> __l{ __pool->_M_mtx };
+	   __atomic_load(__wait_addr, &__val, __args._M_order);
+	   if (__val == __args._M_old &&
+	       __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
+	     __res = make_pair(true, __val);
+	}
+      else
+	__res = make_pair(false, __val);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	  // caller does not externally track contention
+	  __pool->_M_leave_wait();
+      return __res;
+    }
+
+    template<typename _Clock, typename _Dur>
+      __wait_result_type
+      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
+		   const chrono::time_point<_Clock, _Dur>& __atime) noexcept
       {
-	using __base_type = __waiter_base<__timed_waiter_pool>;
-
-	template<typename _Tp>
-	  __timed_waiter(const _Tp* __addr) noexcept
-	  : __base_type(__addr)
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_enter_wait();
-	}
-
-	~__timed_waiter()
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
-	}
-
-	// returns true if wait ended before timeout
-	template<typename _Tp, typename _ValFn,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
-			     const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
+	if constexpr (is_same_v<__wait_clock_t, _Clock>)
+	  return __detail::__wait_until_impl(__addr, __args, __atime);
+	else
 	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin(__old, std::move(__vfn), __val,
-			   __timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
-	  }
+	     auto __res = __detail::__wait_until_impl(__addr, __args,
+					    __to_wait_clock(__atime));
+	     if (!__res.first)
+	       {
+		  // We got a timeout when measured against __clock_t but
+		  // we need to check against the caller-supplied clock
+		  // to tell whether we should return a timeout.
+		  if (_Clock::now() < __atime)
+		    return make_pair(true, __res.second);
+		}
+	      return __res;
+	    }
+      }
 
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
-			  const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-	  {
-	    for (auto __now = _Clock::now(); __now < __atime;
-		  __now = _Clock::now())
-	      {
-		if (__base_type::_M_w._M_do_wait_until(
-		      __base_type::_M_addr, __val, __atime)
-		    && __pred())
-		  return true;
-
-		if (__base_type::_M_do_spin(__pred, __val,
-			       __timed_backoff_spin_policy(__atime, __now)))
-		  return true;
-	      }
-	    return false;
-	  }
-
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred,
-			   const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val,
-					__timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return _M_do_wait_until(__pred, __val, __atime);
-	  }
-
-	template<typename _Tp, typename _ValFn,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
-			   const chrono::duration<_Rep, _Period>&
-								__rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin_v(__old, std::move(__vfn), __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return __base_type::_M_w._M_do_wait_until(
-					  __base_type::_M_addr,
-					  __val,
-					  chrono::steady_clock::now() + __reltime);
-	  }
-
-	template<typename _Pred,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for(_Pred __pred,
-			 const chrono::duration<_Rep, _Period>& __rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return _M_do_wait_until(__pred, __val,
-				    chrono::steady_clock::now() + __reltime);
-	  }
-      };
-
-    using __enters_timed_wait = __timed_waiter<std::true_type>;
-    using __bare_timed_wait = __timed_waiter<std::false_type>;
+    template<typename _Rep, typename _Period>
+      __wait_result_type
+      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
+		 const chrono::duration<_Rep, _Period>& __rtime) noexcept
+    {
+      if (!__rtime.count())
+	// no rtime supplied, just spin a bit
+	return __detail::__wait_impl(__addr, __args | __wait_flags::__spin_only);
+      auto const __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
+      auto const __atime = chrono::steady_clock::now() + __reltime;
+      return __detail::__wait_until(__addr, __args, __atime);
+    }
   } // namespace __detail
 
   // returns true if wait ended before timeout
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred,
+				_ValFn&& __vfn,
+				const chrono::time_point<_Clock, _Dur>& __atime,
+				bool __bare_wait = false) noexcept
+    {
+       const auto __wait_addr =
+	   reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+       __detail::__wait_args __args{ __addr, __bare_wait };
+       _Tp __val = __vfn();
+       while (!__pred(__val))
+	 {
+	   auto __res = __detail::__wait_until(__wait_addr, __args, __atime);
+	   if (!__res.first)
+	     // timed out
+	     return __res.first; // C++26 will also return last observed __val
+	   __val = __vfn();
+	 }
+       return true; // C++26 will also return last observed __val
+    }
+
+  template<typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until_v(const __detail::__platform_wait_t* __addr,
+				  __detail::__platform_wait_t __old,
+				  int __order,
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
+    {
+      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+      auto __res = __detail::__wait_until(__addr, __args, __atime);
+      return __res.first; // C++26 will also return last observed __val
+    }
+
   template<typename _Tp, typename _ValFn,
 	   typename _Clock, typename _Dur>
     bool
     __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-			const chrono::time_point<_Clock, _Dur>&
-			    __atime) noexcept
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until_v(__old, __vfn, __atime);
+       auto __pfn = [&](const _Tp& __val)
+	   { return !__detail::__atomic_compare(__old, __val); };
+       return __atomic_wait_address_until(__addr, __pfn, forward<_ValFn>(__vfn),
+					  __atime, __bare_wait);
     }
 
-  template<typename _Tp, typename _Pred,
-	   typename _Clock, typename _Dur>
-    bool
-    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Rep, typename _Period>
+   bool
+   __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred,
+			     _ValFn&& __vfn,
+			     const chrono::duration<_Rep, _Period>& __rtime,
+			     bool __bare_wait = false) noexcept
+  {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
+	  if (!__res.first)
+	    // timed out
+	    return __res.first; // C++26 will also return last observed __val
+	  __val = __vfn();
+	}
+      return true; // C++26 will also return last observed __val
+  }
 
-  template<typename _Pred,
-	   typename _Clock, typename _Dur>
+  template<typename _Rep, typename _Period>
     bool
-    __atomic_wait_address_until_bare(const __detail::__platform_wait_t* __addr,
-				_Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+    __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr,
+				__detail::__platform_wait_t __old,
+				int __order,
+				const chrono::time_point<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
+  {
+    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+    auto __res = __detail::__wait_for(__addr, __args, __rtime);
+    return __res.first; // C++26 will also return last observed __Val
+  }
 
   template<typename _Tp, typename _ValFn,
 	   typename _Rep, typename _Period>
     bool
     __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
+				const chrono::duration<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
-    }
-
-  template<typename _Tp, typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
-    }
-
-  template<typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for_bare(const __detail::__platform_wait_t* __addr,
-			_Pred __pred,
-			const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      return __atomic_wait_address_for(__addr, __pfn, forward<_ValFn>(__vfn),
+				       __rtime, __bare_wait);
     }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 506b7f0cc9d..63d132fc9a8 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -47,7 +47,8 @@
 # include <bits/functexcept.h>
 #endif
 
-# include <bits/std_mutex.h>  // std::mutex, std::__condvar
+#include <bits/stl_pair.h>
+#include <bits/std_mutex.h>  // std::mutex, std::__condvar
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -131,7 +132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __thread_yield() noexcept
     {
 #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
-     __gthread_yield();
+      __gthread_yield();
 #endif
     }
 
@@ -148,38 +149,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     inline constexpr auto __atomic_spin_count_relax = 12;
     inline constexpr auto __atomic_spin_count = 16;
 
-    struct __default_spin_policy
-    {
-      bool
-      operator()() const noexcept
-      { return false; }
-    };
-
-    template<typename _Pred,
-	     typename _Spin = __default_spin_policy>
-      bool
-      __atomic_spin(_Pred& __pred, _Spin __spin = _Spin{ }) noexcept
-      {
-	for (auto __i = 0; __i < __atomic_spin_count; ++__i)
-	  {
-	    if (__pred())
-	      return true;
-
-	    if (__i < __atomic_spin_count_relax)
-	      __detail::__thread_relax();
-	    else
-	      __detail::__thread_yield();
-	  }
-
-	while (__spin())
-	  {
-	    if (__pred())
-	      return true;
-	  }
-
-	return false;
-      }
-
     // return true if equal
     template<typename _Tp>
       bool __atomic_compare(const _Tp& __a, const _Tp& __b)
@@ -188,7 +157,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
       }
 
-    struct __waiter_pool_base
+    struct __waiter_pool_impl
     {
       // Don't use std::hardware_destructive_interference_size here because we
       // don't want the layout of library types to depend on compiler options.
@@ -205,7 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
       __condvar _M_cv;
 #endif
-      __waiter_pool_base() = default;
+      __waiter_pool_impl() = default;
 
       void
       _M_enter_wait() noexcept
@@ -223,256 +192,271 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __res != 0;
       }
 
-      void
-      _M_notify(__platform_wait_t* __addr, [[maybe_unused]] bool __all,
-		bool __bare) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	if (__addr == &_M_ver)
-	  {
-	    __atomic_fetch_add(__addr, 1, __ATOMIC_SEQ_CST);
-	    __all = true;
-	  }
-
-	if (__bare || _M_waiting())
-	  __platform_notify(__addr, __all);
-#else
-	{
-	  lock_guard<mutex> __l(_M_mtx);
-	  __atomic_fetch_add(__addr, 1, __ATOMIC_RELAXED);
-	}
-	if (__bare || _M_waiting())
-	  _M_cv.notify_all();
-#endif
-      }
-
-      static __waiter_pool_base&
-      _S_for(const void* __addr) noexcept
+      static __waiter_pool_impl&
+      _S_impl_for(const void* __addr) noexcept
       {
 	constexpr uintptr_t __ct = 16;
-	static __waiter_pool_base __w[__ct];
+	static __waiter_pool_impl __w[__ct];
 	auto __key = (uintptr_t(__addr) >> 2) % __ct;
 	return __w[__key];
       }
     };
 
-    struct __waiter_pool : __waiter_pool_base
+    enum class __wait_flags : uint32_t
     {
-      void
-      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	__platform_wait(__addr, __old);
-#else
-	__platform_wait_t __val;
-	__atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
-	if (__val == __old)
-	  {
-	    lock_guard<mutex> __l(_M_mtx);
-	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	    if (__val == __old)
-	      _M_cv.wait(_M_mtx);
-	  }
-#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
-      }
+       __abi_version = 0,
+       __proxy_wait = 1,
+       __track_contention = 2,
+       __do_spin = 4,
+       __spin_only = 8 | __do_spin, // implies __do_spin
+       __abi_version_mask = 0xffff0000,
     };
 
-    template<typename _Tp>
-      struct __waiter_base
+    struct __wait_args
+    {
+      __platform_wait_t _M_old;
+      int _M_order = __ATOMIC_ACQUIRE;
+      __wait_flags _M_flags;
+
+      template<typename _Tp>
+	explicit __wait_args(const _Tp* __addr,
+			     bool __bare_wait = false) noexcept
+	    : _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
+		  int __order, bool __bare_wait = false) noexcept
+	  : _M_old{ __old }
+	  , _M_order{ __order }
+	  , _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __wait_args&) noexcept = default;
+      __wait_args&
+      operator=(const __wait_args&) noexcept = default;
+
+      bool
+      operator&(__wait_flags __flag) const noexcept
       {
-	using __waiter_type = _Tp;
+	 using __t = underlying_type_t<__wait_flags>;
+	 return static_cast<__t>(_M_flags)
+	     & static_cast<__t>(__flag);
+      }
 
-	__waiter_type& _M_w;
-	__platform_wait_t* _M_addr;
+      __wait_args
+      operator|(__wait_flags __flag) const noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	__wait_args __res{ *this };
+	const auto __flags = static_cast<__t>(__res._M_flags)
+			     | static_cast<__t>(__flag);
+	__res._M_flags = __wait_flags{ __flags };
+	return __res;
+      }
 
-	template<typename _Up>
-	  static __platform_wait_t*
-	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
-	  {
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
-	    else
-	      return __b;
-	  }
+    private:
+      static int
+      constexpr _S_default_flags() noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	return static_cast<__t>(__wait_flags::__abi_version)
+		| static_cast<__t>(__wait_flags::__do_spin);
+      }
 
-	static __waiter_type&
-	_S_for(const void* __addr) noexcept
+      template<typename _Tp>
+	static int
+	constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
 	{
-	  static_assert(sizeof(__waiter_type) == sizeof(__waiter_pool_base));
-	  auto& res = __waiter_pool_base::_S_for(__addr);
-	  return reinterpret_cast<__waiter_type&>(res);
+	  auto __res = _S_default_flags();
+	  if (!__bare_wait)
+	    __res |= static_cast<int>(__wait_flags::__track_contention);
+	  if constexpr (!__platform_wait_uses_type<_Tp>)
+	    __res |= static_cast<int>(__wait_flags::__proxy_wait);
+	  return __res;
 	}
 
-	template<typename _Up>
-	  explicit __waiter_base(const _Up* __addr) noexcept
-	    : _M_w(_S_for(__addr))
-	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
-	  { }
-
-	void
-	_M_notify(bool __all, bool __bare = false) noexcept
-	{ _M_w._M_notify(_M_addr, __all, __bare); }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin_v(__platform_wait_t* __addr,
-		       const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  {
-	    auto const __pred = [=]
-	      { return !__detail::__atomic_compare(__old, __vfn()); };
-
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      {
-		__builtin_memcpy(&__val, &__old, sizeof(__val));
-	      }
-	    else
-	      {
-		__atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	      }
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin(const __platform_wait_t* __addr,
-		     _Pred __pred,
-		     __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  {
-	    __atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
-      };
-
-    template<typename _EntersWait>
-      struct __waiter : __waiter_base<__waiter_pool>
-      {
-	using __base_type = __waiter_base<__waiter_pool>;
-
-	template<typename _Tp>
-	  explicit __waiter(const _Tp* __addr) noexcept
-	    : __base_type(__addr)
-	  {
-	    if constexpr (_EntersWait::value)
-	      _M_w._M_enter_wait();
-	  }
-
-	~__waiter()
+      template<typename _Tp>
+	static int
+	_S_memory_order_for(const _Tp*, int __order) noexcept
 	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
+	  if constexpr (__platform_wait_uses_type<_Tp>)
+	    return __order;
+	  return __ATOMIC_ACQUIRE;
+	}
+    };
+
+    using __wait_result_type = pair<bool, __platform_wait_t>;
+    inline __wait_result_type
+    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+      __platform_wait_t __val;
+      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
+	{
+	  __atomic_load(__addr, &__val, __args._M_order);
+	  if (__val != __args._M_old)
+	    return make_pair(true, __val);
+	  if (__i < __atomic_spin_count_relax)
+	    __detail::__thread_relax();
+	  else
+	    __detail::__thread_yield();
+	}
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
+	}
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
+
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_impl(__wait_addr, __args);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
 	}
 
-	template<typename _Tp, typename _ValFn>
-	  void
-	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin_v(__old, __vfn, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (__detail::__atomic_compare(__old, __vfn()));
-	  }
+      if (!(__args & __wait_flags::__track_contention))
+	{
+	  // caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				       : __pool;
+#endif
+	  __pool->_M_enter_wait();
+	}
 
-	template<typename _Pred>
-	  void
-	  _M_do_wait(_Pred __pred) noexcept
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin(__pred, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (!__pred());
-	  }
-      };
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_wait(__wait_addr, __args._M_old);
+      __res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	    lock_guard<mutex> __l{ __pool->_M_mtx };
+	    __atomic_load(__wait_addr, &__val, __args._M_order);
+	    if (__val == __args._M_old)
+		__pool->_M_cv.wait(__pool->_M_mtx);
+	}
+      __res = make_pair(false, __val);
+#endif
 
-    using __enters_wait = __waiter<std::true_type>;
-    using __bare_wait = __waiter<std::false_type>;
+      if (!(__args & __wait_flags::__track_contention))
+	// caller does not externally track contention
+	__pool->_M_leave_wait();
+      return __res;
+    }
+
+    inline void
+    __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool __all,
+		  __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_notify, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  if (!__pool->_M_waiting())
+	    return;
+	}
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	   __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+					: __pool;
+#endif
+	   __wait_addr = &__pool->_M_ver;
+	   __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED);
+	   __all = true;
+	 }
+
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_notify(__wait_addr, __all);
+#else
+      lock_guard<mutex> __l{ __pool->_M_mtx };
+      __pool->_M_cv.notify_all();
+#endif
+    }
   } // namespace __detail
 
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn>
+    void
+    __atomic_wait_address(const _Tp* __addr,
+			  _Pred&& __pred, _ValFn&& __vfn,
+			  bool __bare_wait = false) noexcept
+    {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  __detail::__wait_impl(__wait_addr, __args);
+	  __val = __vfn();
+	}
+      // C++26 will return __val
+    }
+
+  inline void
+  __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
+			  __detail::__platform_wait_t __old,
+			  int __order)
+  {
+    __detail::__wait_args __args{ __addr, __old, __order };
+    // C++26 will not ignore the return value here
+    __detail::__wait_impl(__addr, __args);
+  }
+
   template<typename _Tp, typename _ValFn>
     void
     __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
 			    _ValFn __vfn) noexcept
     {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait_v(__old, __vfn);
-    }
-
-  template<typename _Tp, typename _Pred>
-    void
-    __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
-    {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait(__pred);
-    }
-
-  // This call is to be used by atomic types which track contention externally
-  template<typename _Pred>
-    void
-    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
-			       _Pred __pred) noexcept
-    {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-      do
-	{
-	  __detail::__platform_wait_t __val;
-	  if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
-	    return;
-	  __detail::__platform_wait(__addr, __val);
-	}
-      while (!__pred());
-#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
-      __detail::__bare_wait __w(__addr);
-      __w._M_do_wait(__pred);
-#endif
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      __atomic_wait_address(__addr, __pfn, forward<_ValFn>(__vfn));
     }
 
   template<typename _Tp>
     void
-    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
+    __atomic_notify_address(const _Tp* __addr, bool __all,
+			    bool __bare_wait = false) noexcept
     {
-      __detail::__bare_wait __w(__addr);
-      __w._M_notify(__all);
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      __detail::__notify_impl(__wait_addr, __all, __args);
     }
-
-  // This call is to be used by atomic types which track contention externally
-  inline void
-  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
-			       bool __all) noexcept
-  {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-    __detail::__platform_notify(__addr, __all);
-#else
-    __detail::__bare_wait __w(__addr);
-    __w._M_notify(__all, true);
-#endif
-  }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // __glibcxx_atomic_wait
diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
index 7e07b308cdc..f137c9df681 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -194,10 +194,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __atomic_semaphore(const __atomic_semaphore&) = delete;
     __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
 
-    static _GLIBCXX_ALWAYS_INLINE bool
-    _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
+    static _GLIBCXX_ALWAYS_INLINE __detail::__platform_wait_t
+    _S_get_current(__detail::__platform_wait_t* __counter) noexcept
+    {
+      return __atomic_impl::load(__counter, memory_order::acquire);
+    }
+
+    static _GLIBCXX_ALWAYS_INLINE bool
+    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
+		      __detail::__platform_wait_t __old) noexcept
     {
-      auto __old = __atomic_impl::load(__counter, memory_order::acquire);
       if (__old == 0)
 	return false;
 
@@ -210,17 +216,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     _M_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      std::__atomic_wait_address_bare(&_M_counter, __pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
     }
 
     bool
     _M_try_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      return std::__detail::__atomic_spin(__pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      return __atomic_wait_address_for(&_M_counter, __pred, __vfn,
+					 __detail::__wait_clock_t::duration(),
+					 true);
     }
 
     template<typename _Clock, typename _Duration>
@@ -228,21 +238,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_try_acquire_until(const chrono::time_point<_Clock,
 			   _Duration>& __atime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_until(&_M_counter,
+						__pred, __vfn, __atime, true);
       }
 
     template<typename _Rep, typename _Period>
       _GLIBCXX_ALWAYS_INLINE bool
-      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
-	noexcept
+      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_for(&_M_counter,
+					      __pred, __vfn, __rtime, true);
       }
 
     _GLIBCXX_ALWAYS_INLINE void
@@ -251,9 +262,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
 	return;
       if (__update > 1)
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
       else
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
 // FIXME - Figure out why this does not wake a waiting thread
 //	__atomic_notify_address_bare(&_M_counter, false);
     }
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index 8e03a5820c6..08286fd95d2 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -192,11 +192,7 @@ It looks different from literature pseudocode for two main reasons:
       wait(arrival_token&& __old_phase) const
       {
 	__atomic_phase_const_ref_t __phase(_M_phase);
-	auto const __test_fn = [=]
-	  {
-	    return __phase.load(memory_order_acquire) != __old_phase;
-	  };
-	std::__atomic_wait_address(&_M_phase, __test_fn);
+	__phase.wait(__old_phase, memory_order_acquire);
       }
 
       void
diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index 27cd80d7100..525647c1701 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -74,8 +74,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     wait() const noexcept
     {
-      auto const __pred = [this] { return this->try_wait(); };
-      std::__atomic_wait_address(&_M_a, __pred);
+        auto const __vfn = [this] { return this->try_wait(); };
+        auto const __pred = [this](bool __b) { return __b; };
+        std::__atomic_wait_address(&_M_a, __pred, __vfn);
     }
 
     _GLIBCXX_ALWAYS_INLINE void
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
index 78e718e4597..aa1d57d1457 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
@@ -54,8 +54,8 @@ main()
     atom->store(0);
   }
 
-  auto a = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[0]));
-  auto b = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[1]));
+  auto a = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[0]));
+  auto b = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[1]));
   VERIFY( a == b );
 
   auto fut0 = std::async(std::launch::async, [&] { atomics.a[0]->wait(0); });
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] libstdc++: Pass __wait_args to internal API by const pointer
  2023-11-16 13:45 [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
@ 2023-11-16 13:45 ` Jonathan Wakely
  2023-11-16 20:46 ` [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
  2024-03-09 12:18 ` Jonathan Wakely
  2 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2023-11-16 13:45 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Thomas Rodgers, Thomas Rodgers

From: Thomas Rodgers <trodgers@redhat.com>

Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.

-- >8 --

This change splits the __wait_args data members to a new struct
__wait_args_base and then passes that type by const pointer to the low
level implementation functions.

libstdc++-v3/ChangeLog:

	* include/bits/atomic_timed_wait.h (__spin_until_impl): Accept
	__wait_args as const __wait_args_base*.
	(__wait_until_impl): Likewise.
	(__wait_until): Likewise.
	(__wait_for): Likewise.
	(__atomic_wait_address_until): Pass __wait_args by address.
	(__atomic_wait_address_for): Likewise.
	* include/bits/atomic_wait.h (__wait_args_base): New struct.
	(__wait_args): Derive from __wait_args_base.
	(__wait_args::__wait_args()): Adjust ctors to call call base ctor.
	(__wait_args::__wait_args(const __wait_args_base&)): New ctor.
	(__wait_args::operator|=): New method.
	(__wait_args::_S_flags_for): Change return type to
	__wait_flags.
	(__spin_impl): Accept __wait_args as const __wait_args_base*.
	(__wait_impl): Likewise.
	(__notify_impl): Likewise.
	(__atomic_wait_address): Pass __wait_args by address.
	(__atomic_wait_address_v): Likewise.
	(__atomic_notify_address): Likewise.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 35 +++++++----
 libstdc++-v3/include/bits/atomic_wait.h       | 62 +++++++++++++------
 2 files changed, 64 insertions(+), 33 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 90427ef8b42..6f6e8758ee2 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -133,9 +133,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // _GLIBCXX_HAS_GTHREADS
 
     inline __wait_result_type
-    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+    __spin_until_impl(const __platform_wait_t* __addr,
+		      const __wait_args_base* __a,
 		      const __wait_clock_t::time_point& __deadline)
     {
+      __wait_args __args{ *__a };
       auto __t0 = __wait_clock_t::now();
       using namespace literals::chrono_literals;
 
@@ -161,7 +163,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 	else
 	{
-	  auto __res = __detail::__spin_impl(__addr, __args);
+	  auto __res = __detail::__spin_impl(__addr, __a);
 	  if (__res.first)
 	    return __res;
 	}
@@ -174,9 +176,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
     inline __wait_result_type
-    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+    __wait_until_impl(const __platform_wait_t* __addr,
+		      const __wait_args_base* __a,
 		      const __wait_clock_t::time_point& __atime)
     {
+      __wait_args __args{ *__a };
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
       __waiter_pool_impl* __pool = nullptr;
 #else
@@ -198,7 +202,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       if (__args & __wait_flags::__do_spin)
 	{
-	  auto __res = __detail::__spin_until_impl(__wait_addr, __args, __atime);
+	  auto __res = __detail::__spin_until_impl(__wait_addr, __a, __atime);
 	  if (__res.first)
 	    return __res;
 	  if (__args & __wait_flags::__spin_only)
@@ -244,7 +248,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     template<typename _Clock, typename _Dur>
       __wait_result_type
-      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
+      __wait_until(const __platform_wait_t* __addr, const __wait_args* __args,
 		   const chrono::time_point<_Clock, _Dur>& __atime) noexcept
       {
 	if constexpr (is_same_v<__wait_clock_t, _Clock>)
@@ -267,15 +271,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     template<typename _Rep, typename _Period>
       __wait_result_type
-      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
+      __wait_for(const __platform_wait_t* __addr, const __wait_args_base* __a,
 		 const chrono::duration<_Rep, _Period>& __rtime) noexcept
     {
+      __wait_args __args{ *__a };
       if (!__rtime.count())
-	// no rtime supplied, just spin a bit
-	return __detail::__wait_impl(__addr, __args | __wait_flags::__spin_only);
+	{
+	  // no rtime supplied, just spin a bit
+	  __args |= __wait_flags::__spin_only;
+	  return __detail::__wait_impl(__addr, &__args);
+	}
+
       auto const __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
       auto const __atime = chrono::steady_clock::now() + __reltime;
-      return __detail::__wait_until(__addr, __args, __atime);
+      return __detail::__wait_until(__addr, &__args, __atime);
     }
   } // namespace __detail
 
@@ -295,7 +304,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
        _Tp __val = __vfn();
        while (!__pred(__val))
 	 {
-	   auto __res = __detail::__wait_until(__wait_addr, __args, __atime);
+	   auto __res = __detail::__wait_until(__wait_addr, &__args, __atime);
 	   if (!__res.first)
 	     // timed out
 	     return __res.first; // C++26 will also return last observed __val
@@ -313,7 +322,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				  bool __bare_wait = false) noexcept
     {
       __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
-      auto __res = __detail::__wait_until(__addr, __args, __atime);
+      auto __res = __detail::__wait_until(__addr, &__args, __atime);
       return __res.first; // C++26 will also return last observed __val
     }
 
@@ -345,7 +354,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _Tp __val = __vfn();
       while (!__pred(__val))
 	{
-	  auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
+	  auto __res = __detail::__wait_for(__wait_addr, &__args, __rtime);
 	  if (!__res.first)
 	    // timed out
 	    return __res.first; // C++26 will also return last observed __val
@@ -363,7 +372,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				bool __bare_wait = false) noexcept
   {
     __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
-    auto __res = __detail::__wait_for(__addr, __args, __rtime);
+    auto __res = __detail::__wait_for(__addr, &__args, __rtime);
     return __res.first; // C++26 will also return last observed __Val
   }
 
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 63d132fc9a8..8abcdec3d50 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -212,23 +212,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
        __abi_version_mask = 0xffff0000,
     };
 
-    struct __wait_args
+    struct __wait_args_base
     {
-      __platform_wait_t _M_old;
-      int _M_order = __ATOMIC_ACQUIRE;
       __wait_flags _M_flags;
+      int _M_order = __ATOMIC_ACQUIRE;
+      __platform_wait_t _M_old = 0;
+    };
 
+    struct __wait_args : __wait_args_base
+    {
       template<typename _Tp>
 	explicit __wait_args(const _Tp* __addr,
 			     bool __bare_wait = false) noexcept
-	    : _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	    : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
 	{ }
 
       __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
 		  int __order, bool __bare_wait = false) noexcept
-	  : _M_old{ __old }
-	  , _M_order{ __order }
-	  , _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	  : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order, __old }
+	{ }
+
+      explicit __wait_args(const __wait_args_base& __base)
+	  : __wait_args_base{ __base }
 	{ }
 
       __wait_args(const __wait_args&) noexcept = default;
@@ -254,6 +259,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __res;
       }
 
+      __wait_args&
+      operator|=(__wait_flags __flag) noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	const auto __flags = static_cast<__t>(_M_flags)
+			     | static_cast<__t>(__flag);
+	_M_flags = __wait_flags{ __flags };
+	return *this;
+      }
+
     private:
       static int
       constexpr _S_default_flags() noexcept
@@ -264,7 +279,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
 
       template<typename _Tp>
-	static int
+	static __wait_flags
 	constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
 	{
 	  auto __res = _S_default_flags();
@@ -272,7 +287,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    __res |= static_cast<int>(__wait_flags::__track_contention);
 	  if constexpr (!__platform_wait_uses_type<_Tp>)
 	    __res |= static_cast<int>(__wait_flags::__proxy_wait);
-	  return __res;
+	  return static_cast<__wait_flags>(__res);
 	}
 
       template<typename _Tp>
@@ -287,13 +302,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     using __wait_result_type = pair<bool, __platform_wait_t>;
     inline __wait_result_type
-    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
+    __spin_impl(const __platform_wait_t* __addr, const __wait_args_base* __args)
     {
       __platform_wait_t __val;
       for (auto __i = 0; __i < __atomic_spin_count; ++__i)
 	{
-	  __atomic_load(__addr, &__val, __args._M_order);
-	  if (__val != __args._M_old)
+	  __atomic_load(__addr, &__val, __args->_M_order);
+	  if (__val != __args->_M_old)
 	    return make_pair(true, __val);
 	  if (__i < __atomic_spin_count_relax)
 	    __detail::__thread_relax();
@@ -304,8 +319,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
     inline __wait_result_type
-    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
+    __wait_impl(const __platform_wait_t* __addr, const __wait_args_base* __a)
     {
+      __wait_args __args{ *__a };
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
       __waiter_pool_impl* __pool = nullptr;
 #else
@@ -314,20 +330,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
       __platform_wait_t* __wait_addr;
+      __platform_wait_t __old;
       if (__args & __wait_flags::__proxy_wait)
 	{
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
 	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
 #endif
 	  __wait_addr = &__pool->_M_ver;
-	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
+	  __atomic_load(__wait_addr, &__old, __args._M_order);
 	}
       else
-	__wait_addr = const_cast<__platform_wait_t*>(__addr);
+	{
+	  __wait_addr = const_cast<__platform_wait_t*>(__addr);
+	  __old = __args._M_old;
+	}
+
 
       if (__args & __wait_flags::__do_spin)
 	{
-	  auto __res = __detail::__spin_impl(__wait_addr, __args);
+	  auto __res = __detail::__spin_impl(__wait_addr, __a);
 	  if (__res.first)
 	    return __res;
 	  if (__args & __wait_flags::__spin_only)
@@ -369,8 +390,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     inline void
     __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool __all,
-		  __wait_args __args)
+		  const __wait_args_base* __a)
     {
+      __wait_args __args{ __a };
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
       __waiter_pool_impl* __pool = nullptr;
 #else
@@ -421,7 +443,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _Tp __val = __vfn();
       while (!__pred(__val))
 	{
-	  __detail::__wait_impl(__wait_addr, __args);
+	  __detail::__wait_impl(__wait_addr, &__args);
 	  __val = __vfn();
 	}
       // C++26 will return __val
@@ -434,7 +456,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
     __detail::__wait_args __args{ __addr, __old, __order };
     // C++26 will not ignore the return value here
-    __detail::__wait_impl(__addr, __args);
+    __detail::__wait_impl(__addr, &__args);
   }
 
   template<typename _Tp, typename _ValFn>
@@ -455,7 +477,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       const auto __wait_addr =
 	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
       __detail::__wait_args __args{ __addr, __bare_wait };
-      __detail::__notify_impl(__wait_addr, __all, __args);
+      __detail::__notify_impl(__wait_addr, __all, &__args);
     }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-11-16 13:45 [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
  2023-11-16 13:45 ` [PATCH 2/2] libstdc++: Pass __wait_args to internal API by const pointer Jonathan Wakely
@ 2023-11-16 20:46 ` Jonathan Wakely
  2024-03-09 12:18 ` Jonathan Wakely
  2 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2023-11-16 20:46 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Thomas Rodgers, Thomas Rodgers

On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> From: Thomas Rodgers <trodgers@redhat.com>
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.

I'm seeing this on AIX and Solaris:

WARNING: program timed out.
FAIL: 30_threads/semaphore/try_acquire.cc  -std=gnu++20 execution test


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-11-16 13:45 [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
  2023-11-16 13:45 ` [PATCH 2/2] libstdc++: Pass __wait_args to internal API by const pointer Jonathan Wakely
  2023-11-16 20:46 ` [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
@ 2024-03-09 12:18 ` Jonathan Wakely
  2024-03-09 12:27   ` Jonathan Wakely
  2 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2024-03-09 12:18 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Thomas Rodgers, Nate Eldredge

[-- Attachment #1: Type: text/plain, Size: 36769 bytes --]

On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely <jwakely@redhat.com> wrote:

> From: Thomas Rodgers <trodgers@redhat.com>
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.
>
>
> -- >8 --
>
> This represents a major refactoring of the previous atomic::wait
> and atomic::notify implementation detail. The aim of this change
> is to simplify the implementation details and position the resulting
> implementation so that much of the current header-only detail
> can be moved into the shared library, while also accounting for
> anticipated changes to wait/notify functionality for C++26.
>
> The previous implementation implemented spin logic in terms of
> the types __default_spin_policy, __timed_backoff_spin_policy, and
> the free function __atomic_spin. These are replaced in favor of
> two new free functions; __spin_impl and __spin_until_impl. These
> currently inline free functions are expected to be moved into the
> libstdc++ shared library in a future commit.
>
> The previous implementation derived untimed and timed wait
> implementation detail from __detail::__waiter_pool_base. This
> is-a relationship is removed in the new version and the previous
> implementation detail is renamed to reflect this change. The
> static _S_for member has been renamed as well to indicate that it
> returns the __waiter_pool_impl entry in the static 'side table'
> for a given awaited address.
>
> This new implementation replaces all of the non-templated waiting
> detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
> __bare_wait with the __wait_impl free function, and the supporting
> __wait_flags enum and __wait_args struct. This currenly inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This new implementation replaces all of the non-templated notifying
> detail of __waiter_base, __waiter_pool, and __waiter with the
> __notify_impl free function. This currently inline free function
> is expected to be moved into the libstdc++ shared library in a
> future commit.
>
> The __atomic_wait_address template function is updated to account
> for the above changes and to support the expected C++26 change to
> pass the most recent observed value to the caller supplied predicate.
>
> A new non-templated __atomic_wait_address_v free function is added
> that only works for atomic types that operate only on __platform_wait_t
> and requires the caller to supply a memory order. This is intended
> to be the simplest code path for such types.
>
> The __atomic_wait_address_v template function is now implemented in
> terms of new __atomic_wait_address template and continues to accept
> a user supplied "value function" to retrieve the current value of
> the atomic.
>
> The __atomic_notify_address template function is updated to account
> for the above changes.
>
> The template __platform_wait_until_impl is renamed to
> __wait_clock_t. The previous __platform_wait_until template is deleted
> and the functionality previously provided is moved t the new tempalate
> function __wait_until. A similar change is made to the
> __cond_wait_until_impl/__cond_wait_until implementation.
>
> This new implementation similarly replaces all of the non-templated
> waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
> the new __wait_until_impl free function. This currently inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This implementation replaces all templated waiting functions that
> manage clock conversion as well as relative waiting (wait_for) with
> the new template functions __wait_until and __wait_for.
>
> Similarly the previous implementation detail for the various
> __atomic_wait_address_Xxx templates is adjusted to account for the
> implementation changes outlined above.
>
> All of the "bare wait" versions of __atomic_wait_Xxx have been removed
> and replaced with a defaulted boolean __bare_wait parameter on the
> new version of these templates.
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/atomic_timed_wait.h:
>         (__detail::__platform_wait_until_impl): Rename to
>         __platform_wait_until.
>         (__detail::__platform_wait_until): Remove previous
>         definition.
>         (__detail::__cond_wait_until_impl): Rename to
>         __cond_wait_until.
>         (__detail::__cond_wait_until): Remove previous
>         definition.
>         (__detail::__spin_until_impl): New function.
>         (__detail::__wait_until_impl): New function.
>         (__detail::__wait_until): New function.
>         (__detail::__wait_for): New function.
>         (__detail::__timed_waiter_pool): Remove type.
>         (__detail::__timed_backoff_spin_policy): Remove type.
>         (__detail::__timed_waiter): Remove type.
>         (__detail::__enters_timed_wait): Remove type alias.
>         (__detail::__bare_timed_wait): Remove type alias.
>         (__atomic_wait_address_until): Adjust to new implementation
>         detail.
>         (__atomic_wait_address_until_v): Likewise.
>         (__atomic_wait_address_bare): Remove.
>         (__atomic_wait_address_for): Adjust to new implementation
>         detail.
>         (__atomic_wait_address_for_v): Likewise.
>         (__atomic_wait_address_for_bare): Remove.
>         * include/bits/atomic_wait.h: Include bits/stl_pair.h.
>         (__detail::__default_spin_policy): Remove type.
>         (__detail::__atomic_spin): Remove function.
>         (__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
>         Remove _M_notify.  Rename _S_for to _S_impl_for.
>         (__detail::__waiter_base): Remove type.
>         (__detail::__waiter_pool): Remove type.
>         (__detail::__waiter): Remove type.
>         (__detail::__enters_wait): Remove type alias.
>         (__detail::__bare_wait): Remove type alias.
>         (__detail::__wait_flags): New enum.
>         (__detail::__wait_args): New struct.
>         (__detail::__wait_result_type): New type alias.
>         (__detail::__spin_impl): New function.
>         (__detail::__wait_impl): New function.
>         (__atomic_wait_address): Adjust to new implementation detail.
>         (__atomic_wait_address_v): Likewise.
>         (__atomic_notify_address): Likewise.
>         (__atomic_wait_address_bare): Delete.
>         (__atomic_notify_address_bare): Likewise.
>         * include/bits/semaphore_base.h: Adjust implementation to
>         use new __atomic_wait_address_v contract.
>         * include/std/barrier: Adjust implementation to use new
>         __atomic_wait contract.
>         * include/std/latch: Adjust implementation to use new
>         __atomic_wait contract.
>         * testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
>         Adjust to for __detail::__waiter_pool_base renaming.
> ---
>  libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++----------
>  libstdc++-v3/include/bits/atomic_wait.h       | 486 ++++++++--------
>  libstdc++-v3/include/bits/semaphore_base.h    |  53 +-
>  libstdc++-v3/include/std/barrier              |   6 +-
>  libstdc++-v3/include/std/latch                |   5 +-
>  .../29_atomics/atomic/wait_notify/100334.cc   |   4 +-
>  6 files changed, 514 insertions(+), 589 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
> b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index ba21ab20a11..90427ef8b42 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -74,62 +74,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>  #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>      // returns true if wait ended before timeout
> -    template<typename _Dur>
> -      bool
> -      __platform_wait_until_impl(const __platform_wait_t* __addr,
> -                                __platform_wait_t __old,
> -                                const chrono::time_point<__wait_clock_t,
> _Dur>&
> -                                     __atime) noexcept
> -      {
> -       auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
> -       auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime -
> __s);
>

For much of the mail below I've removed many of the - lines in the diff,
because the changes are huge and often the removed - lines are completely
unrelated to the added + lines adjacent to them.

Just seeing the new code without unrelated code next to it is clearer.


+    bool
> +    __platform_wait_until(const __platform_wait_t* __addr,
> +                         __platform_wait_t __old,
> +                         const __wait_clock_t::time_point& __atime)
> noexcept
> +    {
> +      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
> +      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime -
> __s);
>
> -       struct timespec __rt =
> +      struct timespec __rt =
>         {
>           static_cast<std::time_t>(__s.time_since_epoch().count()),
>           static_cast<long>(__ns.count())
>         };
>



> +      auto __e = syscall (SYS_futex, __addr,
> +
>  static_cast<int>(__futex_wait_flags::__wait_bitset_private),
> +                         __old, &__rt, nullptr,
> +
>  static_cast<int>(__futex_wait_flags::__bitset_match_any));
> +      if (__e)
> +       {
> +         if (errno == ETIMEDOUT)
>             return false;
> -         }
> +         if (errno != EINTR && errno != EAGAIN)
> +           __throw_system_error(errno);
> +       }
> +       return true;
>        }
>  #else
>  // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement
> __platform_wait_until()
> @@ -139,15 +109,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  #ifdef _GLIBCXX_HAS_GTHREADS
>      // Returns true if wait ended before timeout.
>


> +    bool
> +    __cond_wait_until(__condvar& __cv, mutex& __mx,
> +                     const __wait_clock_t::time_point& __atime)
> +    {
>         auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
>         auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime -
> __s);
>
> @@ -158,293 +123,261 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           };
>
>  #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
> -       if constexpr (is_same_v<chrono::steady_clock, _Clock>)
> +       if constexpr (is_same_v<chrono::steady_clock, __wait_clock_t>)
>           __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
>         else
>  #endif
>           __cv.wait_until(__mx, __ts);
>



> +       return __wait_clock_t::now() < __atime;
>        }
>  #endif // _GLIBCXX_HAS_GTHREADS
>



> +    inline __wait_result_type
> +    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
> +                     const __wait_clock_t::time_point& __deadline)
>      {
>



> +      auto __t0 = __wait_clock_t::now();
> +      using namespace literals::chrono_literals;
> +
> +      __platform_wait_t __val;
> +      auto __now = __wait_clock_t::now();
> +      for (; __now < __deadline; __now = __wait_clock_t::now())
> +      {
> +       auto __elapsed = __now - __t0;
> +#ifndef _GLIBCXX_NO_SLEEP
> +       if (__elapsed > 128ms)
> +       {
> +         this_thread::sleep_for(64ms);
> +       }
> +       else if (__elapsed > 64us)
> +       {
> +         this_thread::sleep_for(__elapsed / 2);
> +       }
> +       else
> +#endif
> +       if (__elapsed > 4us)
> +       {
> +         __thread_yield();
> +       }
> +       else
> +       {
> +         auto __res = __detail::__spin_impl(__addr, __args);
> +         if (__res.first)
> +           return __res;
> +       }
> +
> +       __atomic_load(__addr, &__val, __args._M_order);
> +       if (__val != __args._M_old)
> +           return make_pair(true, __val);
> +      }
> +      return make_pair(false, __val);
> +    }
> +
> +    inline __wait_result_type
> +    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
> +                     const __wait_clock_t::time_point& __atime)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_wait, we always need the side-table
> +      __waiter_pool_impl* __pool =
> &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      __platform_wait_t* __wait_addr;
> +      if (__args & __wait_flags::__proxy_wait)
>         {
>  #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>
>


> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         __wait_addr = &__pool->_M_ver;
> +         __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
>         }
>


> +      else
> +       __wait_addr = const_cast<__platform_wait_t*>(__addr);
>



> +      if (__args & __wait_flags::__do_spin)
> +       {
> +         auto __res = __detail::__spin_until_impl(__wait_addr, __args,
> __atime);
> +         if (__res.first)
> +           return __res;
> +         if (__args & __wait_flags::__spin_only)
> +           return __res;
>

As observed by Nate Eldredge, this early return is always taken, even if
the caller didn't set __spin_only.
That's because args & __spin_only is args & 12 which is true if do_spin is
set (which it must be or we wouldn't have entered this block).

That means that any call to __wait_impl with __do_spin set in the flags
(which is in the default flags) will return before ever waiting, resulting
in the caller doing a busy loop.


+       }
>



> +      if (!(__args & __wait_flags::__track_contention))
>

This condition seems backwards. We want to track contention (i.e. use
_M_enter_wait and _M_leave_wait) when the flag is set, but this does it
when it's not set.

       {
>
>

> +       // caller does not externally track contention
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +       __pool = (__pool == nullptr) ?
> &__waiter_pool_impl::_S_impl_for(__addr)
> +                                    : __pool;
>

This would be simpler as just:

    if (!__pool)
      __pool = ...;


> +#endif
> +       __pool->_M_enter_wait();
>        }
>



> +      __wait_result_type __res;
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +      if (__platform_wait_until(__wait_addr, __args._M_old, __atime))
> +       __res = make_pair(true, __args._M_old);
> +      else
> +       __res = make_pair(false, __args._M_old);
> +#else
> +      __platform_wait_t __val;
> +      __atomic_load(__wait_addr, &__val, __args._M_order);
> +      if (__val == __args._M_old)
> +       {
> +          lock_guard<mutex> __l{ __pool->_M_mtx };
> +          __atomic_load(__wait_addr, &__val, __args._M_order);
> +          if (__val == __args._M_old &&
> +              __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
> +            __res = make_pair(true, __val);
>

There's a missing 'else' here, so we don't set anything in __res in this
case.

I think it probably wants to be {false, __val} ?

+       }
> +      else
> +       __res = make_pair(false, __val);
> +#endif
> +
> +      if (!(__args & __wait_flags::__track_contention))
> +         // caller does not externally track contention
> +         __pool->_M_leave_wait();
> +      return __res;
> +    }
> +
> +    template<typename _Clock, typename _Dur>
> +      __wait_result_type
> +      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
> +                  const chrono::time_point<_Clock, _Dur>& __atime)
> noexcept
>        {
>
> +       if constexpr (is_same_v<__wait_clock_t, _Clock>)
> +         return __detail::__wait_until_impl(__addr, __args, __atime);
> +       else
>           {
>
> +            auto __res = __detail::__wait_until_impl(__addr, __args,
> +                                           __to_wait_clock(__atime));
> +            if (!__res.first)
> +              {
> +                 // We got a timeout when measured against __clock_t but
> +                 // we need to check against the caller-supplied clock
> +                 // to tell whether we should return a timeout.
> +                 if (_Clock::now() < __atime)
> +                   return make_pair(true, __res.second);
> +               }
> +             return __res;
> +           }
> +      }
>
>
> +    template<typename _Rep, typename _Period>
> +      __wait_result_type
> +      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
> +                const chrono::duration<_Rep, _Period>& __rtime) noexcept
> +    {
> +      if (!__rtime.count())
> +       // no rtime supplied, just spin a bit
> +       return __detail::__wait_impl(__addr, __args |
> __wait_flags::__spin_only);
>

This should set __do_spin | __spin_only if the latter no longer implies the
former.



> +      auto const __reltime =
> chrono::ceil<__wait_clock_t::duration>(__rtime);
> +      auto const __atime = chrono::steady_clock::now() + __reltime;
> +      return __detail::__wait_until(__addr, __args, __atime);
> +    }
>    } // namespace __detail
>
>    // returns true if wait ended before timeout
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn,
> +          typename _Clock, typename _Dur>
> +    bool
> +    __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred,
> +                               _ValFn&& __vfn,
> +                               const chrono::time_point<_Clock, _Dur>&
> __atime,
> +                               bool __bare_wait = false) noexcept
> +    {
> +       const auto __wait_addr =
> +          reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +       __detail::__wait_args __args{ __addr, __bare_wait };
> +       _Tp __val = __vfn();
> +       while (!__pred(__val))
> +        {
> +          auto __res = __detail::__wait_until(__wait_addr, __args,
> __atime);
> +          if (!__res.first)
> +            // timed out
> +            return __res.first; // C++26 will also return last observed
> __val
> +          __val = __vfn();
> +        }
> +       return true; // C++26 will also return last observed __val
> +    }
> +
> +  template<typename _Clock, typename _Dur>
> +    bool
> +    __atomic_wait_address_until_v(const __detail::__platform_wait_t*
> __addr,
> +                                 __detail::__platform_wait_t __old,
> +                                 int __order,
> +                                 const chrono::time_point<_Clock, _Dur>&
> __atime,
> +                                 bool __bare_wait = false) noexcept
> +    {
> +      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> +      auto __res = __detail::__wait_until(__addr, __args, __atime);
> +      return __res.first; // C++26 will also return last observed __val
> +    }
> +
>    template<typename _Tp, typename _ValFn,
>            typename _Clock, typename _Dur>
>      bool
>      __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old,
> _ValFn&& __vfn,
> -                       const chrono::time_point<_Clock, _Dur>&
> -                           __atime) noexcept
> +                                 const chrono::time_point<_Clock, _Dur>&
> __atime,
> +                                 bool __bare_wait = false) noexcept
>      {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_until_v(__old, __vfn, __atime);
> +       auto __pfn = [&](const _Tp& __val)
> +          { return !__detail::__atomic_compare(__old, __val); };
> +       return __atomic_wait_address_until(__addr, __pfn,
> forward<_ValFn>(__vfn),
> +                                         __atime, __bare_wait);
>      }
>
> -  template<typename _Tp, typename _Pred,
> -          typename _Clock, typename _Dur>
> -    bool
> -    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
> -                               const chrono::time_point<_Clock, _Dur>&
> -                                                             __atime)
> noexcept
> -    {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_until(__pred, __atime);
> -    }
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn,
> +          typename _Rep, typename _Period>
> +   bool
> +   __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred,
> +                            _ValFn&& __vfn,
> +                            const chrono::duration<_Rep, _Period>&
> __rtime,
> +                            bool __bare_wait = false) noexcept
> +  {
> +      const auto __wait_addr =
> +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +      __detail::__wait_args __args{ __addr, __bare_wait };
> +      _Tp __val = __vfn();
> +      while (!__pred(__val))
> +       {
> +         auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
> +         if (!__res.first)
> +           // timed out
> +           return __res.first; // C++26 will also return last observed
> __val
> +         __val = __vfn();
> +       }
> +      return true; // C++26 will also return last observed __val
> +  }
>
> -  template<typename _Pred,
> -          typename _Clock, typename _Dur>
> +  template<typename _Rep, typename _Period>
>      bool
> -    __atomic_wait_address_until_bare(const __detail::__platform_wait_t*
> __addr,
> -                               _Pred __pred,
> -                               const chrono::time_point<_Clock, _Dur>&
> -                                                             __atime)
> noexcept
> -    {
> -      __detail::__bare_timed_wait __w{__addr};
> -      return __w._M_do_wait_until(__pred, __atime);
> -    }
> +    __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr,
> +                               __detail::__platform_wait_t __old,
> +                               int __order,
> +                               const chrono::time_point<_Rep, _Period>&
> __rtime,
> +                               bool __bare_wait = false) noexcept
> +  {
> +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> +    auto __res = __detail::__wait_for(__addr, __args, __rtime);
> +    return __res.first; // C++26 will also return last observed __Val
> +  }
>
>    template<typename _Tp, typename _ValFn,
>            typename _Rep, typename _Period>
>      bool
>      __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&&
> __vfn,
> -                     const chrono::duration<_Rep, _Period>& __rtime)
> noexcept
> +                               const chrono::duration<_Rep, _Period>&
> __rtime,
> +                               bool __bare_wait = false) noexcept
>      {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
> -    }
> -
> -  template<typename _Tp, typename _Pred,
> -          typename _Rep, typename _Period>
> -    bool
> -    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
> -                     const chrono::duration<_Rep, _Period>& __rtime)
> noexcept
> -    {
> -
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_for(__pred, __rtime);
> -    }
> -
> -  template<typename _Pred,
> -          typename _Rep, typename _Period>
> -    bool
> -    __atomic_wait_address_for_bare(const __detail::__platform_wait_t*
> __addr,
> -                       _Pred __pred,
> -                       const chrono::duration<_Rep, _Period>& __rtime)
> noexcept
> -    {
> -      __detail::__bare_timed_wait __w{__addr};
> -      return __w._M_do_wait_for(__pred, __rtime);
> +      auto __pfn = [&](const _Tp& __val)
> +         { return !__detail::__atomic_compare(__old, __val); };
> +      return __atomic_wait_address_for(__addr, __pfn,
> forward<_ValFn>(__vfn),
> +                                      __rtime, __bare_wait);
>      }
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> index 506b7f0cc9d..63d132fc9a8 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -47,7 +47,8 @@
>  # include <bits/functexcept.h>
>  #endif
>
> -# include <bits/std_mutex.h>  // std::mutex, std::__condvar
> +#include <bits/stl_pair.h>
> +#include <bits/std_mutex.h>  // std::mutex, std::__condvar
>
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
> @@ -131,7 +132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      __thread_yield() noexcept
>      {
>  #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
> -     __gthread_yield();
> +      __gthread_yield();
>  #endif
>      }
>



> @@ -188,7 +157,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
>        }
>
> -    struct __waiter_pool_base
> +    struct __waiter_pool_impl
>

Would it make sense to call this type "__wait_proxy" or something like that?
"Waiter pool impl" doesn't really tell me much, where is the waiter pool
that this is implementing?

There used to be __waiter_pool and __waiter_pool_base and __waiter, but now
there's just __waiter_pool_impl which isn't very descriptive in isolation.



>      {
>        // Don't use std::hardware_destructive_interference_size here
> because we
>        // don't want the layout of library types to depend on compiler
> options.
> @@ -205,7 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>        __condvar _M_cv;
>  #endif
> -      __waiter_pool_base() = default;
> +      __waiter_pool_impl() = default;
>
>        void
>        _M_enter_wait() noexcept
>




> -
> -      static __waiter_pool_base&
> -      _S_for(const void* __addr) noexcept
> +      static __waiter_pool_impl&
> +      _S_impl_for(const void* __addr) noexcept
>        {
>         constexpr uintptr_t __ct = 16;
> -       static __waiter_pool_base __w[__ct];
> +       static __waiter_pool_impl __w[__ct];
>         auto __key = (uintptr_t(__addr) >> 2) % __ct;
>         return __w[__key];
>        }
>      };
>



> +    enum class __wait_flags : uint32_t
>      {
>


> +       __abi_version = 0,
> +       __proxy_wait = 1,
> +       __track_contention = 2,
> +       __do_spin = 4,
> +       __spin_only = 8 | __do_spin, // implies __do_spin
>

This should be just 8 and not also imply __do_spin.


> +       __abi_version_mask = 0xffff0000,
>      };
>
> -    template<typename _Tp>
> -      struct __waiter_base
> +    struct __wait_args
> +    {
> +      __platform_wait_t _M_old;
> +      int _M_order = __ATOMIC_ACQUIRE;
> +      __wait_flags _M_flags;
> +
> +      template<typename _Tp>
> +       explicit __wait_args(const _Tp* __addr,
> +                            bool __bare_wait = false) noexcept
> +           : _M_flags{ _S_flags_for(__addr, __bare_wait) }
> +       { }
> +
> +      __wait_args(const __platform_wait_t* __addr, __platform_wait_t
> __old,
> +                 int __order, bool __bare_wait = false) noexcept
> +         : _M_old{ __old }
> +         , _M_order{ __order }
> +         , _M_flags{ _S_flags_for(__addr, __bare_wait) }
> +       { }
> +
> +      __wait_args(const __wait_args&) noexcept = default;
> +      __wait_args&
> +      operator=(const __wait_args&) noexcept = default;
> +
> +      bool
> +      operator&(__wait_flags __flag) const noexcept
>        {
>
>

> +        using __t = underlying_type_t<__wait_flags>;
> +        return static_cast<__t>(_M_flags)
> +            & static_cast<__t>(__flag);
> +      }
>



> +      __wait_args
> +      operator|(__wait_flags __flag) const noexcept
> +      {
> +       using __t = underlying_type_t<__wait_flags>;
> +       __wait_args __res{ *this };
> +       const auto __flags = static_cast<__t>(__res._M_flags)
> +                            | static_cast<__t>(__flag);
> +       __res._M_flags = __wait_flags{ __flags };
> +       return __res;
> +      }
>

Seems like it would be clearer to define operator| for the actual enum
type, so that these member functions don't need to repeat all the casting
logic.



> +    private:
> +      static int
> +      constexpr _S_default_flags() noexcept
> +      {
> +       using __t = underlying_type_t<__wait_flags>;
> +       return static_cast<__t>(__wait_flags::__abi_version)
> +               | static_cast<__t>(__wait_flags::__do_spin);
> +      }
>


> +      template<typename _Tp>
> +       static int
> +       constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
>         {
>


> +         auto __res = _S_default_flags();
> +         if (!__bare_wait)
> +           __res |= static_cast<int>(__wait_flags::__track_contention);
> +         if constexpr (!__platform_wait_uses_type<_Tp>)
> +           __res |= static_cast<int>(__wait_flags::__proxy_wait);
> +         return __res;
>         }
>


+      template<typename _Tp>
> +       static int
> +       _S_memory_order_for(const _Tp*, int __order) noexcept
>         {
>


> +         if constexpr (__platform_wait_uses_type<_Tp>)
> +           return __order;
> +         return __ATOMIC_ACQUIRE;
> +       }
> +    };
>

This function is never used. Should it be?
Is the idea to force the use of acquire ordering when waiting on a proxy
instead of waiting on the atomic variable directly?
Is that done elsewhere? Can this function be removed, or should it be used
somewhere?



> +
> +    using __wait_result_type = pair<bool, __platform_wait_t>;
> +    inline __wait_result_type
> +    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
> +    {
> +      __platform_wait_t __val;
> +      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
> +       {
> +         __atomic_load(__addr, &__val, __args._M_order);
> +         if (__val != __args._M_old)
> +           return make_pair(true, __val);
> +         if (__i < __atomic_spin_count_relax)
> +           __detail::__thread_relax();
> +         else
> +           __detail::__thread_yield();
> +       }
> +      return make_pair(false, __val);
>

Every use of make_pair can be replaced by just {x, y} to avoid making a
function call (which will compile faster too).


> +    }
> +
> +    inline __wait_result_type
> +    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_wait, we always need the side-table
> +      __waiter_pool_impl* __pool =
> &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      __platform_wait_t* __wait_addr;
>

This could be const __platform_wait_t*


> +      if (__args & __wait_flags::__proxy_wait)
> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         __wait_addr = &__pool->_M_ver;
> +         __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
> +       }
> +      else
> +       __wait_addr = const_cast<__platform_wait_t*>(__addr);
> +
> +      if (__args & __wait_flags::__do_spin)
> +       {
> +         auto __res = __detail::__spin_impl(__wait_addr, __args);
> +         if (__res.first)
> +           return __res;
> +         if (__args & __wait_flags::__spin_only)
> +           return __res;
>

As above, this early return is always taken, resulting in busy loops.

        }
>



> +      if (!(__args & __wait_flags::__track_contention))
> +       {
> +         // caller does not externally track contention
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = (__pool == nullptr) ?
> &__waiter_pool_impl::_S_impl_for(__addr)
> +                                      : __pool;
>

  if (!pool)
    pool = ...;


> +#endif
> +         __pool->_M_enter_wait();
> +       }
>



> +      __wait_result_type __res;
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __platform_wait(__wait_addr, __args._M_old);
> +      __res = make_pair(false, __args._M_old);
> +#else
> +      __platform_wait_t __val;
> +      __atomic_load(__wait_addr, &__val, __args._M_order);
> +      if (__val == __args._M_old)
> +       {
> +           lock_guard<mutex> __l{ __pool->_M_mtx };
> +           __atomic_load(__wait_addr, &__val, __args._M_order);
> +           if (__val == __args._M_old)
> +               __pool->_M_cv.wait(__pool->_M_mtx);
> +       }
> +      __res = make_pair(false, __val);
> +#endif
>
> -    using __enters_wait = __waiter<std::true_type>;
> -    using __bare_wait = __waiter<std::false_type>;
> +      if (!(__args & __wait_flags::__track_contention))
>

Condition is backwards?


> +       // caller does not externally track contention
> +       __pool->_M_leave_wait();
>

This _M_leave_wait() could be done by an RAII type's destructor instead.


> +      return __res;
> +    }
> +
> +    inline void
> +    __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool
> __all,
> +                 __wait_args __args)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_notify, we always need the side-table
> +      __waiter_pool_impl* __pool =
> &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      if (!(__args & __wait_flags::__track_contention))
>

Condition is backwards?


> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         if (!__pool->_M_waiting())
> +           return;
> +       }
> +
> +      __platform_wait_t* __wait_addr;
> +      if (__args & __wait_flags::__proxy_wait)
> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +          __pool = (__pool == nullptr) ?
> &__waiter_pool_impl::_S_impl_for(__addr)
> +                                       : __pool;
>

  if (!pool)
    pool = ...;


> +#endif
> +          __wait_addr = &__pool->_M_ver;
> +          __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED);
> +          __all = true;
> +        }
>

For a non-proxied wait, __wait_addr is uninitialized here. So the futex
wait uses a random address.

There needs to be `else __wait_addr = __addr;`

+
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __platform_notify(__wait_addr, __all);
> +#else
> +      lock_guard<mutex> __l{ __pool->_M_mtx };
> +      __pool->_M_cv.notify_all();
> +#endif
> +    }
>    } // namespace __detail
>
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn>
> +    void
> +    __atomic_wait_address(const _Tp* __addr,
> +                         _Pred&& __pred, _ValFn&& __vfn,
> +                         bool __bare_wait = false) noexcept
> +    {
> +      const auto __wait_addr =
> +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +      __detail::__wait_args __args{ __addr, __bare_wait };
> +      _Tp __val = __vfn();
> +      while (!__pred(__val))
> +       {
>

As observed by Nate, the old value is never set in the __args, so the wait
always uses 0 (the value set in the __wait_args constructor).

It needs:

    if constexpr (__platform_wait_uses_type<_Tp>)
      __args._M_old = __builtin_bit_cast(__platform_wait_t, __val);

For proxied waits _M_old isn't used, pool->_M_ver is used instead.



> +         __detail::__wait_impl(__wait_addr, __args);
> +         __val = __vfn();
> +       }
> +      // C++26 will return __val
> +    }
> +
> +  inline void
> +  __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
> +                         __detail::__platform_wait_t __old,
> +                         int __order)
> +  {
> +    __detail::__wait_args __args{ __addr, __old, __order };
> +    // C++26 will not ignore the return value here
> +    __detail::__wait_impl(__addr, __args);
> +  }
> +
>    template<typename _Tp, typename _ValFn>
>      void
>      __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
>                             _ValFn __vfn) noexcept
>      {
> diff --git a/libstdc++-v3/include/std/latch
> b/libstdc++-v3/include/std/latch
> index 27cd80d7100..525647c1701 100644
> --- a/libstdc++-v3/include/std/latch
> +++ b/libstdc++-v3/include/std/latch
> @@ -74,8 +74,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      _GLIBCXX_ALWAYS_INLINE void
>      wait() const noexcept
>      {
> -      auto const __pred = [this] { return this->try_wait(); };
> -      std::__atomic_wait_address(&_M_a, __pred);
> +        auto const __vfn = [this] { return this->try_wait(); };
> +        auto const __pred = [this](bool __b) { return __b; };
>

This second lambda doesn't need to capture 'this'.


> +        std::__atomic_wait_address(&_M_a, __pred, __vfn);
>      }
>
>      _GLIBCXX_ALWAYS_INLINE void
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2024-03-09 12:18 ` Jonathan Wakely
@ 2024-03-09 12:27   ` Jonathan Wakely
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2024-03-09 12:27 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Thomas Rodgers, Nate Eldredge

[-- Attachment #1: Type: text/plain, Size: 1485 bytes --]

On Sat, 9 Mar 2024 at 12:18, Jonathan Wakely <jwakely@redhat.com> wrote:

>
>
>
> +    template<typename _Rep, typename _Period>
>> +      __wait_result_type
>> +      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
>> +                const chrono::duration<_Rep, _Period>& __rtime) noexcept
>> +    {
>> +      if (!__rtime.count())
>> +       // no rtime supplied, just spin a bit
>> +       return __detail::__wait_impl(__addr, __args |
>> __wait_flags::__spin_only);
>>
>
> This should set __do_spin | __spin_only if the latter no longer implies
> the former.
>
>
>>
>> +    enum class __wait_flags : uint32_t
>>      {
>>
>
>
>> +       __abi_version = 0,
>> +       __proxy_wait = 1,
>> +       __track_contention = 2,
>> +       __do_spin = 4,
>> +       __spin_only = 8 | __do_spin, // implies __do_spin
>>
>
> This should be just 8 and not also imply __do_spin.
>


Alternatively, we could have:

       __spin_only = 4,
       __spin_then_wait = 8,
       __do_spin = __spin_only | __spin_then_wait,

Then testing (flags & do_spin) would be true if either of the others is
set, but (flags & spin_only) would do the right thing.

But if we have spin_then_wait in the default flags, a caller that wants to
use spin_only has to clear the spin_then_wait flag, otherwise there are two
mutually exclusive flags set at once.

I think I prefer:

       __do_spin = 4,
       __spin_only = 8, // Ignored unless __do_spin is also set.

as this is simpler for callers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-12-14 22:23     ` Thomas Rodgers
@ 2023-12-14 23:30       ` Nate Eldredge
  0 siblings, 0 replies; 10+ messages in thread
From: Nate Eldredge @ 2023-12-14 23:30 UTC (permalink / raw)
  To: Thomas Rodgers; +Cc: Jonathan Wakely, Jonathan Wakely, gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

On Thu, 14 Dec 2023, Thomas Rodgers wrote:

> I need to look at this a bit more (and not on my phone, at lunch).
> Ultimately, C++26 expects to add predicate waits and returning a
> ‘tri-state’ result isn’t something that’s been considered or likely to be
> approved.

Ok, then that seems to fit best with my second suggestion: the predicate 
should only test the value and do nothing else, and actually trying to 
decrement the semaphore is left up to the caller _M_acquire(), which must 
then retry __atomic_wait_address if the compare-exchange fails.

-- 
Nate Eldredge
nate@thatsmathematics.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-12-11 20:18   ` Jonathan Wakely
@ 2023-12-14 22:23     ` Thomas Rodgers
  2023-12-14 23:30       ` Nate Eldredge
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Rodgers @ 2023-12-14 22:23 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jonathan Wakely, Nate Eldredge, gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 3798 bytes --]

I need to look at this a bit more (and not on my phone, at lunch).
Ultimately, C++26 expects to add predicate waits and returning a
‘tri-state’ result isn’t something that’s been considered or likely to be
approved.

On Mon, Dec 11, 2023 at 12:18 PM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

> CCing Tom's current address, as he's not @redhat.com now.
>
> On Mon, 11 Dec 2023, 19:24 Nate Eldredge, <nate@thatsmathematics.com>
> wrote:
>
>> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>>
>> > To fix, we need something like `__args._M_old = __val;` inside the loop
>> in
>> > __atomic_wait_address(), so that we always wait on the exact value that
>> the
>> > predicate __pred() rejected.  Again, there are similar instances in
>> > atomic_timed_wait.h.
>>
>> Thinking through this, there's another problem.  The main loop in
>> __atomic_wait_address() would then be:
>>
>>        while (!__pred(__val))
>>          {
>>            __args._M_old = __val;
>>            __detail::__wait_impl(__wait_addr, &__args);
>>            __val = __vfn();
>>          }
>>
>> The idea being that we only call __wait_impl to wait on a value that the
>> predicate said was unacceptable.  But looking for instance at its caller
>> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
>> passed in is _S_do_try_acquire(), whose code is:
>>
>>      _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>>                        __detail::__platform_wait_t __old) noexcept
>>      {
>>        if (__old == 0)
>>          return false;
>>
>>        return __atomic_impl::compare_exchange_strong(__counter,
>>                                                      __old, __old - 1,
>>
>>  memory_order::acquire,
>>
>>  memory_order::relaxed);
>>      }
>>
>> It returns false if the value passed in was unacceptable (i.e. zero),
>> *or*
>> if it was nonzero (let's say 1) but the compare_exchange failed because
>> another thread swooped in and modified the semaphore counter.  In that
>> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
>> is likewise bad.  If the counter is externally changed back to 1 just
>> before we call __platform_wait (that's the futex call), we would go to
>> sleep waiting on a semaphore that is already available: deadlock.
>>
>> I guess there's a couple ways to fix it.
>>
>> You could have the "predicate" callback instead return a tri-state value:
>> "all done, stop waiting" (like current true), "value passed is not
>> acceptable" (like current false), and "value was acceptable but something
>> else went wrong".  Only the second case should result in calling
>> __wait_impl().  In the third case, __atomic_wait_address() should
>> just reload the value (using __vfn()) and loop again.
>>
>> Or, make the callback __pred() a pure predicate that only tests its input
>> value for acceptability, without any other side effects.  Then have
>> __atomic_wait_address() simply return as soon as __pred(__val) returns
>> true.  It would be up to the caller to actually decrement the semaphore
>> or
>> whatever, and to call __atomic_wait_address() again if this fails.  In
>> that case, __atomic_wait_address should probably return the final value
>> that was read, so the caller can immediately do something like a
>> compare-exchange using it, and not have to do an additional load and
>> predicate test.
>>
>> Or, make __pred() a pure predicate as before, and give
>> __atomic_wait_address yet one more callback function argument, call it
>> __taker(), whose job is to acquire the semaphore etc, and have
>> __atomic_wait_address call it after __pred(__val) returns true.
>>
>> --
>> Nate Eldredge
>> nate@thatsmathematics.com
>>
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-12-11 19:24 ` Nate Eldredge
@ 2023-12-11 20:18   ` Jonathan Wakely
  2023-12-14 22:23     ` Thomas Rodgers
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2023-12-11 20:18 UTC (permalink / raw)
  To: Nate Eldredge; +Cc: Jonathan Wakely, gcc-patches, libstdc++, Thomas Rodgers

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

CCing Tom's current address, as he's not @redhat.com now.

On Mon, 11 Dec 2023, 19:24 Nate Eldredge, <nate@thatsmathematics.com> wrote:

> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>
> > To fix, we need something like `__args._M_old = __val;` inside the loop
> in
> > __atomic_wait_address(), so that we always wait on the exact value that
> the
> > predicate __pred() rejected.  Again, there are similar instances in
> > atomic_timed_wait.h.
>
> Thinking through this, there's another problem.  The main loop in
> __atomic_wait_address() would then be:
>
>        while (!__pred(__val))
>          {
>            __args._M_old = __val;
>            __detail::__wait_impl(__wait_addr, &__args);
>            __val = __vfn();
>          }
>
> The idea being that we only call __wait_impl to wait on a value that the
> predicate said was unacceptable.  But looking for instance at its caller
> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
> passed in is _S_do_try_acquire(), whose code is:
>
>      _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>                        __detail::__platform_wait_t __old) noexcept
>      {
>        if (__old == 0)
>          return false;
>
>        return __atomic_impl::compare_exchange_strong(__counter,
>                                                      __old, __old - 1,
>                                                      memory_order::acquire,
>
>  memory_order::relaxed);
>      }
>
> It returns false if the value passed in was unacceptable (i.e. zero), *or*
> if it was nonzero (let's say 1) but the compare_exchange failed because
> another thread swooped in and modified the semaphore counter.  In that
> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
> is likewise bad.  If the counter is externally changed back to 1 just
> before we call __platform_wait (that's the futex call), we would go to
> sleep waiting on a semaphore that is already available: deadlock.
>
> I guess there's a couple ways to fix it.
>
> You could have the "predicate" callback instead return a tri-state value:
> "all done, stop waiting" (like current true), "value passed is not
> acceptable" (like current false), and "value was acceptable but something
> else went wrong".  Only the second case should result in calling
> __wait_impl().  In the third case, __atomic_wait_address() should
> just reload the value (using __vfn()) and loop again.
>
> Or, make the callback __pred() a pure predicate that only tests its input
> value for acceptability, without any other side effects.  Then have
> __atomic_wait_address() simply return as soon as __pred(__val) returns
> true.  It would be up to the caller to actually decrement the semaphore or
> whatever, and to call __atomic_wait_address() again if this fails.  In
> that case, __atomic_wait_address should probably return the final value
> that was read, so the caller can immediately do something like a
> compare-exchange using it, and not have to do an additional load and
> predicate test.
>
> Or, make __pred() a pure predicate as before, and give
> __atomic_wait_address yet one more callback function argument, call it
> __taker(), whose job is to acquire the semaphore etc, and have
> __atomic_wait_address call it after __pred(__val) returns true.
>
> --
> Nate Eldredge
> nate@thatsmathematics.com
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
  2023-12-11  8:16 Nate Eldredge
@ 2023-12-11 19:24 ` Nate Eldredge
  2023-12-11 20:18   ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Nate Eldredge @ 2023-12-11 19:24 UTC (permalink / raw)
  To: jwakely; +Cc: trodgers, gcc-patches, libstdc++

On Mon, 11 Dec 2023, Nate Eldredge wrote:

> To fix, we need something like `__args._M_old = __val;` inside the loop in 
> __atomic_wait_address(), so that we always wait on the exact value that the 
> predicate __pred() rejected.  Again, there are similar instances in 
> atomic_timed_wait.h.

Thinking through this, there's another problem.  The main loop in 
__atomic_wait_address() would then be:

       while (!__pred(__val))
         {
           __args._M_old = __val;
           __detail::__wait_impl(__wait_addr, &__args);
           __val = __vfn();
         }

The idea being that we only call __wait_impl to wait on a value that the 
predicate said was unacceptable.  But looking for instance at its caller 
__atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate 
passed in is _S_do_try_acquire(), whose code is:

     _S_do_try_acquire(__detail::__platform_wait_t* __counter,
                       __detail::__platform_wait_t __old) noexcept
     {
       if (__old == 0)
         return false;

       return __atomic_impl::compare_exchange_strong(__counter,
                                                     __old, __old - 1,
                                                     memory_order::acquire,
                                                     memory_order::relaxed);
     }

It returns false if the value passed in was unacceptable (i.e. zero), *or* 
if it was nonzero (let's say 1) but the compare_exchange failed because 
another thread swooped in and modified the semaphore counter.  In that 
latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which 
is likewise bad.  If the counter is externally changed back to 1 just 
before we call __platform_wait (that's the futex call), we would go to 
sleep waiting on a semaphore that is already available: deadlock.

I guess there's a couple ways to fix it.

You could have the "predicate" callback instead return a tri-state value: 
"all done, stop waiting" (like current true), "value passed is not 
acceptable" (like current false), and "value was acceptable but something 
else went wrong".  Only the second case should result in calling 
__wait_impl().  In the third case, __atomic_wait_address() should 
just reload the value (using __vfn()) and loop again.

Or, make the callback __pred() a pure predicate that only tests its input 
value for acceptability, without any other side effects.  Then have 
__atomic_wait_address() simply return as soon as __pred(__val) returns 
true.  It would be up to the caller to actually decrement the semaphore or 
whatever, and to call __atomic_wait_address() again if this fails.  In 
that case, __atomic_wait_address should probably return the final value 
that was read, so the caller can immediately do something like a 
compare-exchange using it, and not have to do an additional load and 
predicate test.

Or, make __pred() a pure predicate as before, and give 
__atomic_wait_address yet one more callback function argument, call it 
__taker(), whose job is to acquire the semaphore etc, and have 
__atomic_wait_address call it after __pred(__val) returns true.

-- 
Nate Eldredge
nate@thatsmathematics.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization
@ 2023-12-11  8:16 Nate Eldredge
  2023-12-11 19:24 ` Nate Eldredge
  0 siblings, 1 reply; 10+ messages in thread
From: Nate Eldredge @ 2023-12-11  8:16 UTC (permalink / raw)
  To: jwakely; +Cc: trodgers, gcc-patches, libstdc++

Ref: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636805.html, 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636804.html

I found a couple of bugs in this patch set.

#1: In atomic_wait.h, we have __wait_flags defined to include:

        __do_spin = 4,
        __spin_only = 8 | __do_spin, // implies __do_spin

So __spin_only is defined as two bits, which breaks when we test `__args & 
__wait_flags::__spin_only` in __wait_impl().  The test evaluates true even 
when we have only set __do_spin (bit 2) without bit 3, which is the 
default at least on Linux and which is supposed to request a limited 
number of spins before calling futex() and sleeping.  You can observe this 
by seeing that a thread blocked on std::counting_semaphore::acquire() 
consumes 100% CPU indefinitely.

There is another instance in atomic_timed_wait.h.

Probably __spin_only should just be 8, and any configuration that wants 
it should be responsible for manually setting __do_spin as well.

#2: __atomic_wait_address() does not populate __args._M_old before passing 
to __wait_impl(), so it remains at its default value 0 (set by the 
constructor).  As a result, `__wait_impl` is effectively always waiting on 
the value 0 (i.e. until `*__addr != 0`) rather than whatever value is 
actually wanted.  This is of course wrong, and can deadlock under the 
right race condition.

To fix, we need something like `__args._M_old = __val;` inside the loop in 
__atomic_wait_address(), so that we always wait on the exact value that 
the predicate __pred() rejected.  Again, there are similar instances in 
atomic_timed_wait.h.

(In the current libstdc++ implementation, the predicate function loads the 
value itself and doesn't return it, so we wait instead on a possibly 
different value that was loaded somewhere else in _S_do_spin().  That is 
also wrong, because the latter value might have been acceptable to the 
predicate, and if so then waiting on it may deadlock.  A classic TOCTOU. 
This is Bug #104928, reported in March 2022 and still unfixed: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928.  The patch proposed 
here would fix it, though.)

-- 
Nate Eldredge
nate@thatsmathematics.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-09 12:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-16 13:45 [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
2023-11-16 13:45 ` [PATCH 2/2] libstdc++: Pass __wait_args to internal API by const pointer Jonathan Wakely
2023-11-16 20:46 ` [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization Jonathan Wakely
2024-03-09 12:18 ` Jonathan Wakely
2024-03-09 12:27   ` Jonathan Wakely
2023-12-11  8:16 Nate Eldredge
2023-12-11 19:24 ` Nate Eldredge
2023-12-11 20:18   ` Jonathan Wakely
2023-12-14 22:23     ` Thomas Rodgers
2023-12-14 23:30       ` Nate Eldredge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).