From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jwakely@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by sourceware.org (Postfix) with ESMTP id D6C703858025
 for <libstdc++@gcc.gnu.org>; Thu, 15 Apr 2021 12:46:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D6C703858025
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-531-Eb30spiHOkyhQWCusf8sEA-1; Thu, 15 Apr 2021 08:46:19 -0400
X-MC-Unique: Eb30spiHOkyhQWCusf8sEA-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com
 [10.5.11.14])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8F17579EC2;
 Thu, 15 Apr 2021 12:46:18 +0000 (UTC)
Received: from localhost (unknown [10.33.36.164])
 by smtp.corp.redhat.com (Postfix) with ESMTP id F07945D9C0;
 Thu, 15 Apr 2021 12:46:17 +0000 (UTC)
Date: Thu, 15 Apr 2021 13:46:17 +0100
From: Jonathan Wakely <jwakely@redhat.com>
To: Thomas Rodgers <rodgert@appliantology.com>
Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org, trodgers@redhat.com,
 Thomas Rodgers <rodgert@twrodgers.com>
Subject: Re: [PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation
Message-ID: <20210415124617.GN3008@redhat.com>
References: <20210303173123.GV3008@redhat.com>
 <20210323190052.261853-1-rodgert@appliantology.com>
MIME-Version: 1.0
In-Reply-To: <20210323190052.261853-1-rodgert@appliantology.com>
X-Clacks-Overhead: GNU Terry Pratchett
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libstdc++@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libstdc++ mailing list <libstdc++.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/libstdc++/>
List-Post: <mailto:libstdc++@gcc.gnu.org>
List-Help: <mailto:libstdc++-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Apr 2021 12:46:24 -0000

On 23/03/21 12:00 -0700, Thomas Rodgers wrote:
>From: Thomas Rodgers <rodgert@twrodgers.com>
>
>* This patch addresses jwakely's previous feedback.
>* This patch also subsumes thiago.macieira@intel.com 's 'Uncontroversial

If this part is intended as part of the commit msg let's put Thiago's
name rather than email address, but I'm assuming this preamble isn't
intended for the commit anyway.

>  improvements to C++20 wait-related implementation'.
>* This patch also changes the atomic semaphore implementation to avoid
>  checking for any waiters before a FUTEX_WAKE op.
>
>This is a substantial rewrite of the atomic wait/notify (and timed wait
>counterparts) implementation.
>
>The previous __platform_wait looped on EINTR however this behavior is
>not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
>now controls whether wait/notify are implemented using a platform
>specific primitive or with a platform agnostic mutex/condvar. This
>patch only supplies a definition for linux futexes. A future update
>could add support __ulock_wait/wake on Darwin, for instance.
>
>The members of __waiters were lifted to a new base class. The members
>are now arranged such that overall sizeof(__waiters_base) fits in two
>cache lines (on platforms with at least 64 byte cache lines). The
>definition will also use destructive_interference_size for this if it
>is available.

N.B. that makes the ABI potentially different with different
compilers, e.g. if you compile it today it will use 64, but then you
compile it with some future version of Clang that defines the
interference sizes it might use a different value. That's OK for now,
but is something to be aware of and remember.


>The __waiters type is now specific to untimed waits. Timed waits have a
>corresponding __timed_waiters type. Much of the code has been moved from
>the previous __atomic_wait() free function to the __waiter_base template
>and a __waiter derived type is provided to implement the un-timed wait
>operations. A similar change has been made to the timed wait
>implementation.

While reading this code I keep getting confused between __waiter
singular and __waiters plural. Would something like __waiter_pool or
__waiters_mgr work instead of __waiters?

>The __atomic_spin code has been extended to take a spin policy which is
>invoked after the initial busy wait loop. The default policy is to
>return from the spin. The timed wait code adds a timed backoff spinning
>policy. The code from <thread> which implements this_thread::sleep_for,
>sleep_until has been moved to a new <bits/std_thread_sleep.h> header
>which allows the thread sleep code to be consumed without pulling in the
>whole of <thread>.

The new header is misnamed. The existing <bits/std_foo.h> headers all
define std::foo, but this doesn't define std::thread::sleep* or
std::thread_sleep*. I think <bits/thread_sleep.h> would be fine, or
<bits/this_thread_sleep.h> if you prefer that.

The original reason I introduced <bits/std_mutex.h> was that
<bits/mutex.h> seemed too likely to clash with something in glibc or
another project using "bits" as a prefix, so I figured std_mutex.h for
std::mutex would be safer. I had the same concern for <bits/thread.h>
and so that's <bits/std_thread.h> too, but I think thread_sleep is
probably sufficiently un-clashy, and this_thread_sleep definitely so.


>The entry points into the wait/notify code have been restructured to
>support either -
>   * Testing the current value of the atomic stored at the given address
>     and waiting on a notification.
>   * Applying a predicate to determine if the wait was satisfied.
>The entry points were renamed to make it clear that the wait and wake
>operations operate on addresses. The first variant takes the expected
>value and a function which returns the current value that should be used
>in comparison operations, these operations are named with a _v suffix
>(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
>variant. Barriers, latches and semaphores use the predicate variant.
>
>This change also centralizes what it means to compare values for the
>purposes of atomic<T>::wait rather than scattering through individual
>predicates.

I like this a lot more, thanks.


>diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
>index 2dc00676054..2e46691c59a 100644
>--- a/libstdc++-v3/include/bits/atomic_base.h
>+++ b/libstdc++-v3/include/bits/atomic_base.h
>@@ -1017,8 +1015,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       wait(const _Tp* __ptr, _Val<_Tp> __old,
> 	   memory_order __m = memory_order_seq_cst) noexcept
>       {
>-	std::__atomic_wait(__ptr, __old,
>-	    [=]() { return load(__ptr, __m) == __old; });
>+	std::__atomic_wait_address_v(__ptr, __old,
>+	    [__ptr, __m]() { return load(__ptr, __m); });

Pre-existing, but __ptr is dependent here so this needs to call
__atomic_impl::load to prevent ADL.


>diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
>index a0c5ef4374e..4b876236d2b 100644
>--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
>+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
>@@ -36,6 +36,7 @@
>
> #if __cpp_lib_atomic_wait
> #include <bits/functional_hash.h>
>+#include <bits/std_thread_sleep.h>
>
> #include <chrono>
>
>@@ -48,19 +49,34 @@ namespace std _GLIBCXX_VISIBILITY(default)
> {
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>-  enum class __atomic_wait_status { no_timeout, timeout };
>-
>   namespace __detail
>   {
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-    using __platform_wait_clock_t = chrono::steady_clock;
>+    using __wait_clock_t = chrono::steady_clock;
>
>-    template<typename _Duration>
>-      __atomic_wait_status
>-      __platform_wait_until_impl(__platform_wait_t* __addr,
>-				 __platform_wait_t __val,
>-				 const chrono::time_point<
>-					  __platform_wait_clock_t, _Duration>&
>+    template<typename _Clock, typename _Dur>
>+      __wait_clock_t::time_point
>+      __to_wait_clock(const chrono::time_point<_Clock, _Dur>& __atime) noexcept
>+      {
>+	const typename _Clock::time_point __c_entry = _Clock::now();
>+	const __wait_clock_t::time_point __s_entry = __wait_clock_t::now();

This is copy&pasted from elsewhere where the "s" prefix is for
system_clock (or steady_clock) so maybe here we want__w_entry
for wait clock?

>+	const auto __delta = __atime - __c_entry;
>+	return __s_entry + __delta;

I think this should be:

   using __w_dur = typename __wait_clock_t::duration;
   return __s_entry + chrono::ceil<__w_dur>(__delta);


>+      }
>+
>+    template<typename _Dur>
>+      __wait_clock_t::time_point
>+      __to_wait_clock(const chrono::time_point<__wait_clock_t,
>+					       _Dur>& __atime) noexcept
>+      { return __atime; }

And strictly speaking, this should be:

   return chrono::ceil<typename __wait_clock_t::duration>(__atime);

but it only matters if somebody passes in a time_point with a
sub-nanosecond (or floating-point) duration. So I guess there's no
need to change it.


>-    struct __timed_waiters : __waiters
>+    struct __timed_waiters : __waiters_base
>     {
>-      template<typename _Clock, typename _Duration>
>-	__atomic_wait_status
>-	_M_do_wait_until(__platform_wait_t __version,
>-			 const chrono::time_point<_Clock, _Duration>& __atime)
>+      // returns true if wait ended before timeout
>+      template<typename _Clock, typename _Dur>
>+	bool
>+	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
>+			 const chrono::time_point<_Clock, _Dur>& __atime)
> 	{
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-	  return __detail::__platform_wait_until(&_M_ver, __version, __atime);
>+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>+	  return __platform_wait_until(__addr, __old, __atime);
> #else
>-	  __platform_wait_t __cur = 0;
>-	  __waiters::__lock_t __l(_M_mtx);
>-	  while (__cur <= __version)
>+	  __platform_wait_t __val;
>+	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	  if (__val == __old)
> 	    {
>-	      if (__detail::__cond_wait_until(_M_cv, _M_mtx, __atime)
>-		    == __atomic_wait_status::timeout)
>-		return __atomic_wait_status::timeout;
>-
>-	      __platform_wait_t __last = __cur;
>-	      __atomic_load(&_M_ver, &__cur, __ATOMIC_ACQUIRE);
>-	      if (__cur < __last)
>-		break; // break the loop if version overflows
>+	      lock_guard<mutex>__l(_M_mtx);

Missing space before the __l name.

>@@ -184,115 +238,238 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> #endif
>       }
>
>-      static __waiters&
>-      _S_for(const void* __t)
>+      static __waiters_base&
>+      _S_for(const void* __addr)

This can be noexcept.

>       {
>-	const unsigned char __mask = 0xf;
>-	static __waiters __w[__mask + 1];
>-
>-	auto __key = _Hash_impl::hash(__t) & __mask;
>+	constexpr uintptr_t __ct = 16;
>+	static __waiters_base __w[__ct];
>+	auto __key = (uintptr_t(__addr) >> 2) % __ct;
> 	return __w[__key];
>       }
>     };
>
>-    struct __waiter
>+    struct __waiters : __waiters_base
>     {
>-      __waiters& _M_w;
>-      __platform_wait_t _M_version;
>-
>-      template<typename _Tp>
>-	__waiter(const _Tp* __addr) noexcept
>-	  : _M_w(__waiters::_S_for(static_cast<const void*>(__addr)))
>-	  , _M_version(_M_w._M_enter_wait())
>-	{ }
>-
>-      ~__waiter()
>-      { _M_w._M_leave_wait(); }
>-
>-      void _M_do_wait() noexcept
>-      { _M_w._M_do_wait(_M_version); }
>+      void
>+      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
>+      {
>+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>+	__platform_wait(__addr, __old);
>+#else
>+	__platform_wait_t __val;
>+	__atomic_load(_M_addr, &__val, __ATOMIC_RELAXED);
>+	if (__val == __old)
>+	  {
>+	    lock_guard<mutex> __l(_M_mtx);
>+	    _M_cv.wait(_M_mtx);
>+	  }
>+#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
>+      }
>     };
>
>-    inline void
>-    __thread_relax() noexcept
>-    {
>-#if defined __i386__ || defined __x86_64__
>-      __builtin_ia32_pause();
>-#elif defined _GLIBCXX_USE_SCHED_YIELD
>-      __gthread_yield();
>-#endif
>-    }
>+    template<typename _Tp, typename _EntersWait>
>+      struct __waiter_base
>+      {
>+	using __waiter_type = _Tp;
>
>-    inline void
>-    __thread_yield() noexcept
>-    {
>-#if defined _GLIBCXX_USE_SCHED_YIELD
>-     __gthread_yield();
>-#endif
>-    }
>+	__waiter_type& _M_w;
>+	__platform_wait_t* _M_addr;
>
>+	template<typename _Up>
>+	  static __platform_wait_t*
>+	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
>+	  {
>+	    if constexpr (__platform_wait_uses_type<_Up>)
>+	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
>+	    else
>+	      return __b;
>+	  }
>+
>+	template<typename _Up>
>+	  static __waiter_type&
>+	  _S_for(const _Up* __addr)

Why is this a function template? It doesn't depend on _Up at all. It
just casts the _Up* to void* so might as well take a void* parameter,
no?

>+	  {
>+	    static_assert(sizeof(__waiter_type) == sizeof(__waiters_base));
>+	    auto& res = __waiters_base::_S_for(static_cast<const void*>(__addr));
>+	    return reinterpret_cast<__waiter_type&>(res);
>+	  }
>+
>+	template<typename _Up>
>+	  explicit __waiter_base(const _Up* __addr) noexcept
>+	    : _M_w(_S_for(__addr))
>+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
>+	  {
>+	    if constexpr (_EntersWait::value)
>+	      _M_w._M_enter_wait();
>+	  }
>+
>+	template<typename _Up>
>+	  __waiter_base(const _Up* __addr, std::false_type) noexcept

This constructor doesn't seem to be used anywhere.

>+	    : _M_w(_S_for(__addr))
>+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
>+	  { }
>+
>+	~__waiter_base()
>+	{
>+	  if constexpr (_EntersWait::value)
>+	    _M_w._M_leave_wait();
>+	}
>+
>+	void
>+	_M_notify(bool __all)
>+	{
>+	  if (_M_addr == &_M_w._M_ver)
>+	    __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
>+	  _M_w._M_notify(_M_addr, __all);
>+	}
>+
>+	template<typename _Up, typename _ValFn,
>+		 typename _Spin = __default_spin_policy>
>+	  static bool
>+	  _S_do_spin_v(__platform_wait_t* __addr,
>+		       const _Up& __old, _ValFn __vfn,
>+		       __platform_wait_t& __val,
>+		       _Spin __spin = _Spin{ })
>+	  {
>+	    auto const __pred = [=]
>+	      { return __atomic_compare(__old, __vfn()); };
>+
>+	    if constexpr (__platform_wait_uses_type<_Up>)
>+	      {
>+		__val == __old;
>+	      }
>+	    else
>+	      {
>+		__atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	      }
>+	    return __atomic_spin(__pred, __spin);
>+	  }
>+
>+	template<typename _Up, typename _ValFn,
>+		 typename _Spin = __default_spin_policy>
>+	  bool
>+	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
>+		       __platform_wait_t& __val,
>+		       _Spin __spin = _Spin{ })
>+	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
>+
>+	template<typename _Pred,
>+		 typename _Spin = __default_spin_policy>
>+	  static bool
>+	  _S_do_spin(const __platform_wait_t* __addr,
>+		     _Pred __pred,
>+		     __platform_wait_t& __val,
>+		     _Spin __spin = _Spin{ })
>+	  {
>+	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	    return __atomic_spin(__pred, __spin);
>+	  }
>+
>+	template<typename _Pred,
>+		 typename _Spin = __default_spin_policy>
>+	  bool
>+	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
>+	             _Spin __spin = _Spin{ })
>+	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
>+      };
>+
>+    template<typename _EntersWait>
>+      struct __waiter : __waiter_base<__waiters, _EntersWait>
>+      {
>+	using __base_type = __waiter_base<__waiters, _EntersWait>;

Why does the base class depend on _EntersWait? That causes all the
code in the base to be duplicated for the two specializations (true
and false). The only parts that differ are the constructor and
destructor, so the derived class could do that, couldn't it?

i.e. have

     template<typename _Tp>
       struct __waiter_base

as the base, then __waiter<_EntersWait> does the _M_enter_wait and
_M_leave_wait calls in its ctor and dtor.

That way we only instantiate two specializations of the base,
__waiter_base<__waiters> and __waiter_base<__timed_waiters>, rather
than four.


>   template<typename _Tp>
>     void
>-    __atomic_notify(const _Tp* __addr, bool __all) noexcept
>+    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
>     {
>-      using namespace __detail;
>-      auto& __w = __waiters::_S_for((void*)__addr);
>-      if (!__w._M_waiting())
>-	return;
>-
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-      if constexpr (__platform_wait_uses_type<_Tp>)
>-	{
>-	  __platform_notify((__platform_wait_t*)(void*) __addr, __all);
>-	}
>-      else
>-#endif
>-	{
>-	  __w._M_notify(__all);
>-	}
>+      __detail::__bare_wait __w(__addr);

Should this be __enters_wait not __bare_wait ?

>+      __w._M_notify(__all);
>     }
>+
>+  // This call is to be used by atomic types which track contention externally
>+  inline void
>+  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
>+			       bool __all) noexcept
>+  {
>+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>+    __detail::__platform_notify(__addr, __all);
>+#else
>+    __detail::__bare_wait __w(__addr);
>+    __w._M_notify(__all);
>+#endif
>+  }
> _GLIBCXX_END_NAMESPACE_VERSION
> } // namespace std
> #endif // GTHREADS || LINUX_FUTEX
>diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
>index b65717e64d7..c21624e0988 100644
>--- a/libstdc++-v3/include/bits/semaphore_base.h
>+++ b/libstdc++-v3/include/bits/semaphore_base.h

[snip]

>-    private:
>-      alignas(__alignof__(_Tp)) _Tp _M_counter;
>-    };
>+  private:
>+    __detail::__platform_wait_t _M_counter;

We still need to force the alignment here.

Jakub said on IRC that m68k might have alignof(int) == 2, so we need
to increase that alignment to 4 to use it as a futex.

For the case where __platform_wait_t is int, we want alignas(4) but I
suppose on a hypothetical platform where we use a 64-bit type as
__platform_wait_t that would be wrong.

Maybe we want a new constant defined alongside the __platform_wait_t
which specifies the requried alignment, then use:

   alignas(__detail::__platform_wait_alignment) __detail::__platform_wait_t
     _M_counter;

Or use alignas(atomic_ref<__platform_wait_t>::required_alignment).