From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id D6C703858025 for ; Thu, 15 Apr 2021 12:46:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D6C703858025 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-531-Eb30spiHOkyhQWCusf8sEA-1; Thu, 15 Apr 2021 08:46:19 -0400 X-MC-Unique: Eb30spiHOkyhQWCusf8sEA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8F17579EC2; Thu, 15 Apr 2021 12:46:18 +0000 (UTC) Received: from localhost (unknown [10.33.36.164]) by smtp.corp.redhat.com (Postfix) with ESMTP id F07945D9C0; Thu, 15 Apr 2021 12:46:17 +0000 (UTC) Date: Thu, 15 Apr 2021 13:46:17 +0100 From: Jonathan Wakely To: Thomas Rodgers Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org, trodgers@redhat.com, Thomas Rodgers Subject: Re: [PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation Message-ID: <20210415124617.GN3008@redhat.com> References: <20210303173123.GV3008@redhat.com> <20210323190052.261853-1-rodgert@appliantology.com> MIME-Version: 1.0 In-Reply-To: <20210323190052.261853-1-rodgert@appliantology.com> X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 12:46:24 -0000 On 23/03/21 12:00 -0700, Thomas Rodgers wrote: >From: Thomas Rodgers > >* This patch addresses jwakely's previous feedback. >* This patch also subsumes thiago.macieira@intel.com 's 'Uncontroversial If this part is intended as part of the commit msg let's put Thiago's name rather than email address, but I'm assuming this preamble isn't intended for the commit anyway. > improvements to C++20 wait-related implementation'. >* This patch also changes the atomic semaphore implementation to avoid > checking for any waiters before a FUTEX_WAKE op. > >This is a substantial rewrite of the atomic wait/notify (and timed wait >counterparts) implementation. > >The previous __platform_wait looped on EINTR however this behavior is >not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro >now controls whether wait/notify are implemented using a platform >specific primitive or with a platform agnostic mutex/condvar. This >patch only supplies a definition for linux futexes. A future update >could add support __ulock_wait/wake on Darwin, for instance. > >The members of __waiters were lifted to a new base class. The members >are now arranged such that overall sizeof(__waiters_base) fits in two >cache lines (on platforms with at least 64 byte cache lines). The >definition will also use destructive_interference_size for this if it >is available. N.B. that makes the ABI potentially different with different compilers, e.g. if you compile it today it will use 64, but then you compile it with some future version of Clang that defines the interference sizes it might use a different value. That's OK for now, but is something to be aware of and remember. >The __waiters type is now specific to untimed waits. Timed waits have a >corresponding __timed_waiters type. Much of the code has been moved from >the previous __atomic_wait() free function to the __waiter_base template >and a __waiter derived type is provided to implement the un-timed wait >operations. A similar change has been made to the timed wait >implementation. While reading this code I keep getting confused between __waiter singular and __waiters plural. Would something like __waiter_pool or __waiters_mgr work instead of __waiters? >The __atomic_spin code has been extended to take a spin policy which is >invoked after the initial busy wait loop. The default policy is to >return from the spin. The timed wait code adds a timed backoff spinning >policy. The code from which implements this_thread::sleep_for, >sleep_until has been moved to a new header >which allows the thread sleep code to be consumed without pulling in the >whole of . The new header is misnamed. The existing headers all define std::foo, but this doesn't define std::thread::sleep* or std::thread_sleep*. I think would be fine, or if you prefer that. The original reason I introduced was that seemed too likely to clash with something in glibc or another project using "bits" as a prefix, so I figured std_mutex.h for std::mutex would be safer. I had the same concern for and so that's too, but I think thread_sleep is probably sufficiently un-clashy, and this_thread_sleep definitely so. >The entry points into the wait/notify code have been restructured to >support either - > * Testing the current value of the atomic stored at the given address > and waiting on a notification. > * Applying a predicate to determine if the wait was satisfied. >The entry points were renamed to make it clear that the wait and wake >operations operate on addresses. The first variant takes the expected >value and a function which returns the current value that should be used >in comparison operations, these operations are named with a _v suffix >(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first >variant. Barriers, latches and semaphores use the predicate variant. > >This change also centralizes what it means to compare values for the >purposes of atomic::wait rather than scattering through individual >predicates. I like this a lot more, thanks. >diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h >index 2dc00676054..2e46691c59a 100644 >--- a/libstdc++-v3/include/bits/atomic_base.h >+++ b/libstdc++-v3/include/bits/atomic_base.h >@@ -1017,8 +1015,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > wait(const _Tp* __ptr, _Val<_Tp> __old, > memory_order __m = memory_order_seq_cst) noexcept > { >- std::__atomic_wait(__ptr, __old, >- [=]() { return load(__ptr, __m) == __old; }); >+ std::__atomic_wait_address_v(__ptr, __old, >+ [__ptr, __m]() { return load(__ptr, __m); }); Pre-existing, but __ptr is dependent here so this needs to call __atomic_impl::load to prevent ADL. >diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h >index a0c5ef4374e..4b876236d2b 100644 >--- a/libstdc++-v3/include/bits/atomic_timed_wait.h >+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h >@@ -36,6 +36,7 @@ > > #if __cpp_lib_atomic_wait > #include >+#include > > #include > >@@ -48,19 +49,34 @@ namespace std _GLIBCXX_VISIBILITY(default) > { > _GLIBCXX_BEGIN_NAMESPACE_VERSION > >- enum class __atomic_wait_status { no_timeout, timeout }; >- > namespace __detail > { >-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX >- using __platform_wait_clock_t = chrono::steady_clock; >+ using __wait_clock_t = chrono::steady_clock; > >- template >- __atomic_wait_status >- __platform_wait_until_impl(__platform_wait_t* __addr, >- __platform_wait_t __val, >- const chrono::time_point< >- __platform_wait_clock_t, _Duration>& >+ template >+ __wait_clock_t::time_point >+ __to_wait_clock(const chrono::time_point<_Clock, _Dur>& __atime) noexcept >+ { >+ const typename _Clock::time_point __c_entry = _Clock::now(); >+ const __wait_clock_t::time_point __s_entry = __wait_clock_t::now(); This is copy&pasted from elsewhere where the "s" prefix is for system_clock (or steady_clock) so maybe here we want__w_entry for wait clock? >+ const auto __delta = __atime - __c_entry; >+ return __s_entry + __delta; I think this should be: using __w_dur = typename __wait_clock_t::duration; return __s_entry + chrono::ceil<__w_dur>(__delta); >+ } >+ >+ template >+ __wait_clock_t::time_point >+ __to_wait_clock(const chrono::time_point<__wait_clock_t, >+ _Dur>& __atime) noexcept >+ { return __atime; } And strictly speaking, this should be: return chrono::ceil(__atime); but it only matters if somebody passes in a time_point with a sub-nanosecond (or floating-point) duration. So I guess there's no need to change it. >- struct __timed_waiters : __waiters >+ struct __timed_waiters : __waiters_base > { >- template >- __atomic_wait_status >- _M_do_wait_until(__platform_wait_t __version, >- const chrono::time_point<_Clock, _Duration>& __atime) >+ // returns true if wait ended before timeout >+ template >+ bool >+ _M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old, >+ const chrono::time_point<_Clock, _Dur>& __atime) > { >-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX >- return __detail::__platform_wait_until(&_M_ver, __version, __atime); >+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >+ return __platform_wait_until(__addr, __old, __atime); > #else >- __platform_wait_t __cur = 0; >- __waiters::__lock_t __l(_M_mtx); >- while (__cur <= __version) >+ __platform_wait_t __val; >+ __atomic_load(__addr, &__val, __ATOMIC_RELAXED); >+ if (__val == __old) > { >- if (__detail::__cond_wait_until(_M_cv, _M_mtx, __atime) >- == __atomic_wait_status::timeout) >- return __atomic_wait_status::timeout; >- >- __platform_wait_t __last = __cur; >- __atomic_load(&_M_ver, &__cur, __ATOMIC_ACQUIRE); >- if (__cur < __last) >- break; // break the loop if version overflows >+ lock_guard__l(_M_mtx); Missing space before the __l name. >@@ -184,115 +238,238 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > #endif > } > >- static __waiters& >- _S_for(const void* __t) >+ static __waiters_base& >+ _S_for(const void* __addr) This can be noexcept. > { >- const unsigned char __mask = 0xf; >- static __waiters __w[__mask + 1]; >- >- auto __key = _Hash_impl::hash(__t) & __mask; >+ constexpr uintptr_t __ct = 16; >+ static __waiters_base __w[__ct]; >+ auto __key = (uintptr_t(__addr) >> 2) % __ct; > return __w[__key]; > } > }; > >- struct __waiter >+ struct __waiters : __waiters_base > { >- __waiters& _M_w; >- __platform_wait_t _M_version; >- >- template >- __waiter(const _Tp* __addr) noexcept >- : _M_w(__waiters::_S_for(static_cast(__addr))) >- , _M_version(_M_w._M_enter_wait()) >- { } >- >- ~__waiter() >- { _M_w._M_leave_wait(); } >- >- void _M_do_wait() noexcept >- { _M_w._M_do_wait(_M_version); } >+ void >+ _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept >+ { >+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >+ __platform_wait(__addr, __old); >+#else >+ __platform_wait_t __val; >+ __atomic_load(_M_addr, &__val, __ATOMIC_RELAXED); >+ if (__val == __old) >+ { >+ lock_guard __l(_M_mtx); >+ _M_cv.wait(_M_mtx); >+ } >+#endif // __GLIBCXX_HAVE_PLATFORM_WAIT >+ } > }; > >- inline void >- __thread_relax() noexcept >- { >-#if defined __i386__ || defined __x86_64__ >- __builtin_ia32_pause(); >-#elif defined _GLIBCXX_USE_SCHED_YIELD >- __gthread_yield(); >-#endif >- } >+ template >+ struct __waiter_base >+ { >+ using __waiter_type = _Tp; > >- inline void >- __thread_yield() noexcept >- { >-#if defined _GLIBCXX_USE_SCHED_YIELD >- __gthread_yield(); >-#endif >- } >+ __waiter_type& _M_w; >+ __platform_wait_t* _M_addr; > >+ template >+ static __platform_wait_t* >+ _S_wait_addr(const _Up* __a, __platform_wait_t* __b) >+ { >+ if constexpr (__platform_wait_uses_type<_Up>) >+ return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a)); >+ else >+ return __b; >+ } >+ >+ template >+ static __waiter_type& >+ _S_for(const _Up* __addr) Why is this a function template? It doesn't depend on _Up at all. It just casts the _Up* to void* so might as well take a void* parameter, no? >+ { >+ static_assert(sizeof(__waiter_type) == sizeof(__waiters_base)); >+ auto& res = __waiters_base::_S_for(static_cast(__addr)); >+ return reinterpret_cast<__waiter_type&>(res); >+ } >+ >+ template >+ explicit __waiter_base(const _Up* __addr) noexcept >+ : _M_w(_S_for(__addr)) >+ , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver)) >+ { >+ if constexpr (_EntersWait::value) >+ _M_w._M_enter_wait(); >+ } >+ >+ template >+ __waiter_base(const _Up* __addr, std::false_type) noexcept This constructor doesn't seem to be used anywhere. >+ : _M_w(_S_for(__addr)) >+ , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver)) >+ { } >+ >+ ~__waiter_base() >+ { >+ if constexpr (_EntersWait::value) >+ _M_w._M_leave_wait(); >+ } >+ >+ void >+ _M_notify(bool __all) >+ { >+ if (_M_addr == &_M_w._M_ver) >+ __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL); >+ _M_w._M_notify(_M_addr, __all); >+ } >+ >+ template+ typename _Spin = __default_spin_policy> >+ static bool >+ _S_do_spin_v(__platform_wait_t* __addr, >+ const _Up& __old, _ValFn __vfn, >+ __platform_wait_t& __val, >+ _Spin __spin = _Spin{ }) >+ { >+ auto const __pred = [=] >+ { return __atomic_compare(__old, __vfn()); }; >+ >+ if constexpr (__platform_wait_uses_type<_Up>) >+ { >+ __val == __old; >+ } >+ else >+ { >+ __atomic_load(__addr, &__val, __ATOMIC_RELAXED); >+ } >+ return __atomic_spin(__pred, __spin); >+ } >+ >+ template+ typename _Spin = __default_spin_policy> >+ bool >+ _M_do_spin_v(const _Up& __old, _ValFn __vfn, >+ __platform_wait_t& __val, >+ _Spin __spin = _Spin{ }) >+ { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); } >+ >+ template+ typename _Spin = __default_spin_policy> >+ static bool >+ _S_do_spin(const __platform_wait_t* __addr, >+ _Pred __pred, >+ __platform_wait_t& __val, >+ _Spin __spin = _Spin{ }) >+ { >+ __atomic_load(__addr, &__val, __ATOMIC_RELAXED); >+ return __atomic_spin(__pred, __spin); >+ } >+ >+ template+ typename _Spin = __default_spin_policy> >+ bool >+ _M_do_spin(_Pred __pred, __platform_wait_t& __val, >+ _Spin __spin = _Spin{ }) >+ { return _S_do_spin(_M_addr, __pred, __val, __spin); } >+ }; >+ >+ template >+ struct __waiter : __waiter_base<__waiters, _EntersWait> >+ { >+ using __base_type = __waiter_base<__waiters, _EntersWait>; Why does the base class depend on _EntersWait? That causes all the code in the base to be duplicated for the two specializations (true and false). The only parts that differ are the constructor and destructor, so the derived class could do that, couldn't it? i.e. have template struct __waiter_base as the base, then __waiter<_EntersWait> does the _M_enter_wait and _M_leave_wait calls in its ctor and dtor. That way we only instantiate two specializations of the base, __waiter_base<__waiters> and __waiter_base<__timed_waiters>, rather than four. > template > void >- __atomic_notify(const _Tp* __addr, bool __all) noexcept >+ __atomic_notify_address(const _Tp* __addr, bool __all) noexcept > { >- using namespace __detail; >- auto& __w = __waiters::_S_for((void*)__addr); >- if (!__w._M_waiting()) >- return; >- >-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX >- if constexpr (__platform_wait_uses_type<_Tp>) >- { >- __platform_notify((__platform_wait_t*)(void*) __addr, __all); >- } >- else >-#endif >- { >- __w._M_notify(__all); >- } >+ __detail::__bare_wait __w(__addr); Should this be __enters_wait not __bare_wait ? >+ __w._M_notify(__all); > } >+ >+ // This call is to be used by atomic types which track contention externally >+ inline void >+ __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr, >+ bool __all) noexcept >+ { >+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >+ __detail::__platform_notify(__addr, __all); >+#else >+ __detail::__bare_wait __w(__addr); >+ __w._M_notify(__all); >+#endif >+ } > _GLIBCXX_END_NAMESPACE_VERSION > } // namespace std > #endif // GTHREADS || LINUX_FUTEX >diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h >index b65717e64d7..c21624e0988 100644 >--- a/libstdc++-v3/include/bits/semaphore_base.h >+++ b/libstdc++-v3/include/bits/semaphore_base.h [snip] >- private: >- alignas(__alignof__(_Tp)) _Tp _M_counter; >- }; >+ private: >+ __detail::__platform_wait_t _M_counter; We still need to force the alignment here. Jakub said on IRC that m68k might have alignof(int) == 2, so we need to increase that alignment to 4 to use it as a futex. For the case where __platform_wait_t is int, we want alignas(4) but I suppose on a hypothetical platform where we use a 64-bit type as __platform_wait_t that would be wrong. Maybe we want a new constant defined alongside the __platform_wait_t which specifies the requried alignment, then use: alignas(__detail::__platform_wait_alignment) __detail::__platform_wait_t _M_counter; Or use alignas(atomic_ref<__platform_wait_t>::required_alignment).