From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 272E73858C60 for ; Sat, 9 Mar 2024 12:19:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 272E73858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 272E73858C60 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709986759; cv=none; b=ERPEZ7CYPU+FuRqQYN6/Fhk/4ZXa/dhOp/60y7tv0x2uQ4fMXQ4ksAXQLT3g7sftOgtIxwSxBZICzctNyvLbIxwt+wxEVpGsrxyE//2gX5vLts0GRaYvHOZneQ6qVlwKusm+wwMzMnFXJCC1tRSj2O2rnKJkxx4RQUrLcZuzWQs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709986759; c=relaxed/simple; bh=gyhWBtIOwvAo+OlTZWxtzn4Rr8nD/wZydyPGhDbyGys=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=dghjIwx+horwR6Kr74nwNEYH9nA4o70VMKcj8Qvgour57iZsukpyWnguu4h9bplKLzgSBMclRTwypjzRqYfJEdVGBiOiIBH0B/BlzupNrypgsT+rji3hMPsfbWjiv3elYaJ/EebarqjMLFzbtVoDxHV8rq3t1fwYYUVxHNEDGJc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709986751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JaeOZ5LmJr0BmctdtcI7Tr7M8471ZDrY3baPL66OilM=; b=Ko/ysnPixCgTt/+AKtgwhiPM0kPFEpQSEqKU6ZvX3HaxHSpfrwN58DbkmxPgLJ89O7OJP+ +4rqlVDapnJgj7v9m/LDfiUWdGdhsSRlKeYj4gNzSHouriE52UN+9fKishWPhxW23L6Oid gqyqPaT7NorYHDSMeHnMNoh6Ze5dJBw= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-naEviEg4Mk2icTdnPede0Q-1; Sat, 09 Mar 2024 07:19:09 -0500 X-MC-Unique: naEviEg4Mk2icTdnPede0Q-1 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-608e4171382so41973537b3.3 for ; Sat, 09 Mar 2024 04:19:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709986749; x=1710591549; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JaeOZ5LmJr0BmctdtcI7Tr7M8471ZDrY3baPL66OilM=; b=WE7gPiXXDjC9Z+BwHYzTcv0al3YPrHwUKtssR2yh3fiT92m0KZjRG0nxMUB7/UZOuW Ze02lq+ZXR3iaIkWebEQK6dXUeImpR2kQLSbqfvxddqd/lJicBTG6tvWuDbjAil8dNmO JRGSbQcB8OKoGfE+2cNZvIKcZzdMK6BH3F1Ao5GNtg5fmpUVnUSGjXvlpoI1YKKamMCe mmOZmGJVKZbk6LGItlOGaMKoufKpKwhblMTYg/PuRx47UNaSv4MaVkElM9c9c0xoYWou ilNItZLT5+auP/3CldmF9aoQEANH49PzPzOAr5+2f16wvRr/MdpG6azRTbmHiGb2OV15 NwMA== X-Gm-Message-State: AOJu0YzDFAG5ElTiUmhz/iEL1GXfGHhF8g5OI/nRhFyusxJgDjmrH7l4 41Flof3XYB/MpHNx0RKYDdkL4rH9PPX94na0SmnFX8q4fcWqFb1M1prgUqE7xbuAs5rvIlZphkG bw880+4JZlr6vY8FgL/yUGmCujMtb5JAMmNoxLzTowV1CXdbcRgxXEZfIv2DOB6tn1gFSsuP7PL sRp6bjQSEXt44IJJN8ajpZ1CRXEzvusgofFNHv3A== X-Received: by 2002:a81:72c3:0:b0:609:c9f1:f95c with SMTP id n186-20020a8172c3000000b00609c9f1f95cmr1551610ywc.48.1709986748903; Sat, 09 Mar 2024 04:19:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IEKMuTyHFSk1poeZMx0Bd6fGlOvP5ej1s899cNxwh71yaW8sDaKd/waRq1KLUU7TN8RT8inR2GYykUqRi2JHDA= X-Received: by 2002:a81:72c3:0:b0:609:c9f1:f95c with SMTP id n186-20020a8172c3000000b00609c9f1f95cmr1551599ywc.48.1709986748461; Sat, 09 Mar 2024 04:19:08 -0800 (PST) MIME-Version: 1.0 References: <20231116134736.1287539-1-jwakely@redhat.com> In-Reply-To: <20231116134736.1287539-1-jwakely@redhat.com> From: Jonathan Wakely Date: Sat, 9 Mar 2024 12:18:52 +0000 Message-ID: Subject: Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: Thomas Rodgers , Nate Eldredge X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="000000000000b692890613395096" X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000b692890613395096 Content-Type: text/plain; charset="UTF-8" On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely wrote: > From: Thomas Rodgers > > These two patches were written by Tom earlier this year, before he left > Red Hat. We should finish reviewing them for GCC 14 (and probably squash > them into one?) > > Tom, you mentioned further work that changes the __platform_wait_t* to > uintptr_t, is that ready, or likely to be ready very soon? > > Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris. > > > -- >8 -- > > This represents a major refactoring of the previous atomic::wait > and atomic::notify implementation detail. The aim of this change > is to simplify the implementation details and position the resulting > implementation so that much of the current header-only detail > can be moved into the shared library, while also accounting for > anticipated changes to wait/notify functionality for C++26. > > The previous implementation implemented spin logic in terms of > the types __default_spin_policy, __timed_backoff_spin_policy, and > the free function __atomic_spin. These are replaced in favor of > two new free functions; __spin_impl and __spin_until_impl. These > currently inline free functions are expected to be moved into the > libstdc++ shared library in a future commit. > > The previous implementation derived untimed and timed wait > implementation detail from __detail::__waiter_pool_base. This > is-a relationship is removed in the new version and the previous > implementation detail is renamed to reflect this change. The > static _S_for member has been renamed as well to indicate that it > returns the __waiter_pool_impl entry in the static 'side table' > for a given awaited address. > > This new implementation replaces all of the non-templated waiting > detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and > __bare_wait with the __wait_impl free function, and the supporting > __wait_flags enum and __wait_args struct. This currenly inline free > function is expected to be moved into the libstdc++ shared library > in a future commit. > > This new implementation replaces all of the non-templated notifying > detail of __waiter_base, __waiter_pool, and __waiter with the > __notify_impl free function. This currently inline free function > is expected to be moved into the libstdc++ shared library in a > future commit. > > The __atomic_wait_address template function is updated to account > for the above changes and to support the expected C++26 change to > pass the most recent observed value to the caller supplied predicate. > > A new non-templated __atomic_wait_address_v free function is added > that only works for atomic types that operate only on __platform_wait_t > and requires the caller to supply a memory order. This is intended > to be the simplest code path for such types. > > The __atomic_wait_address_v template function is now implemented in > terms of new __atomic_wait_address template and continues to accept > a user supplied "value function" to retrieve the current value of > the atomic. > > The __atomic_notify_address template function is updated to account > for the above changes. > > The template __platform_wait_until_impl is renamed to > __wait_clock_t. The previous __platform_wait_until template is deleted > and the functionality previously provided is moved t the new tempalate > function __wait_until. A similar change is made to the > __cond_wait_until_impl/__cond_wait_until implementation. > > This new implementation similarly replaces all of the non-templated > waiting detail of __timed_waiter_pool, __timed_waiter, etc. with > the new __wait_until_impl free function. This currently inline free > function is expected to be moved into the libstdc++ shared library > in a future commit. > > This implementation replaces all templated waiting functions that > manage clock conversion as well as relative waiting (wait_for) with > the new template functions __wait_until and __wait_for. > > Similarly the previous implementation detail for the various > __atomic_wait_address_Xxx templates is adjusted to account for the > implementation changes outlined above. > > All of the "bare wait" versions of __atomic_wait_Xxx have been removed > and replaced with a defaulted boolean __bare_wait parameter on the > new version of these templates. > > libstdc++-v3/ChangeLog: > > * include/bits/atomic_timed_wait.h: > (__detail::__platform_wait_until_impl): Rename to > __platform_wait_until. > (__detail::__platform_wait_until): Remove previous > definition. > (__detail::__cond_wait_until_impl): Rename to > __cond_wait_until. > (__detail::__cond_wait_until): Remove previous > definition. > (__detail::__spin_until_impl): New function. > (__detail::__wait_until_impl): New function. > (__detail::__wait_until): New function. > (__detail::__wait_for): New function. > (__detail::__timed_waiter_pool): Remove type. > (__detail::__timed_backoff_spin_policy): Remove type. > (__detail::__timed_waiter): Remove type. > (__detail::__enters_timed_wait): Remove type alias. > (__detail::__bare_timed_wait): Remove type alias. > (__atomic_wait_address_until): Adjust to new implementation > detail. > (__atomic_wait_address_until_v): Likewise. > (__atomic_wait_address_bare): Remove. > (__atomic_wait_address_for): Adjust to new implementation > detail. > (__atomic_wait_address_for_v): Likewise. > (__atomic_wait_address_for_bare): Remove. > * include/bits/atomic_wait.h: Include bits/stl_pair.h. > (__detail::__default_spin_policy): Remove type. > (__detail::__atomic_spin): Remove function. > (__detail::__waiter_pool_base): Rename to __waiter_pool_impl. > Remove _M_notify. Rename _S_for to _S_impl_for. > (__detail::__waiter_base): Remove type. > (__detail::__waiter_pool): Remove type. > (__detail::__waiter): Remove type. > (__detail::__enters_wait): Remove type alias. > (__detail::__bare_wait): Remove type alias. > (__detail::__wait_flags): New enum. > (__detail::__wait_args): New struct. > (__detail::__wait_result_type): New type alias. > (__detail::__spin_impl): New function. > (__detail::__wait_impl): New function. > (__atomic_wait_address): Adjust to new implementation detail. > (__atomic_wait_address_v): Likewise. > (__atomic_notify_address): Likewise. > (__atomic_wait_address_bare): Delete. > (__atomic_notify_address_bare): Likewise. > * include/bits/semaphore_base.h: Adjust implementation to > use new __atomic_wait_address_v contract. > * include/std/barrier: Adjust implementation to use new > __atomic_wait contract. > * include/std/latch: Adjust implementation to use new > __atomic_wait contract. > * testsuite/29_atomics/atomic/wait_notify/100334.cc (main): > Adjust to for __detail::__waiter_pool_base renaming. > --- > libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++---------- > libstdc++-v3/include/bits/atomic_wait.h | 486 ++++++++-------- > libstdc++-v3/include/bits/semaphore_base.h | 53 +- > libstdc++-v3/include/std/barrier | 6 +- > libstdc++-v3/include/std/latch | 5 +- > .../29_atomics/atomic/wait_notify/100334.cc | 4 +- > 6 files changed, 514 insertions(+), 589 deletions(-) > > diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h > b/libstdc++-v3/include/bits/atomic_timed_wait.h > index ba21ab20a11..90427ef8b42 100644 > --- a/libstdc++-v3/include/bits/atomic_timed_wait.h > +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h > @@ -74,62 +74,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > #ifdef _GLIBCXX_HAVE_LINUX_FUTEX > #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT > // returns true if wait ended before timeout > - template > - bool > - __platform_wait_until_impl(const __platform_wait_t* __addr, > - __platform_wait_t __old, > - const chrono::time_point<__wait_clock_t, > _Dur>& > - __atime) noexcept > - { > - auto __s = chrono::time_point_cast(__atime); > - auto __ns = chrono::duration_cast(__atime - > __s); > For much of the mail below I've removed many of the - lines in the diff, because the changes are huge and often the removed - lines are completely unrelated to the added + lines adjacent to them. Just seeing the new code without unrelated code next to it is clearer. + bool > + __platform_wait_until(const __platform_wait_t* __addr, > + __platform_wait_t __old, > + const __wait_clock_t::time_point& __atime) > noexcept > + { > + auto __s = chrono::time_point_cast(__atime); > + auto __ns = chrono::duration_cast(__atime - > __s); > > - struct timespec __rt = > + struct timespec __rt = > { > static_cast(__s.time_since_epoch().count()), > static_cast(__ns.count()) > }; > > + auto __e = syscall (SYS_futex, __addr, > + > static_cast(__futex_wait_flags::__wait_bitset_private), > + __old, &__rt, nullptr, > + > static_cast(__futex_wait_flags::__bitset_match_any)); > + if (__e) > + { > + if (errno == ETIMEDOUT) > return false; > - } > + if (errno != EINTR && errno != EAGAIN) > + __throw_system_error(errno); > + } > + return true; > } > #else > // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement > __platform_wait_until() > @@ -139,15 +109,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > #ifdef _GLIBCXX_HAS_GTHREADS > // Returns true if wait ended before timeout. > > + bool > + __cond_wait_until(__condvar& __cv, mutex& __mx, > + const __wait_clock_t::time_point& __atime) > + { > auto __s = chrono::time_point_cast(__atime); > auto __ns = chrono::duration_cast(__atime - > __s); > > @@ -158,293 +123,261 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > }; > > #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT > - if constexpr (is_same_v) > + if constexpr (is_same_v) > __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts); > else > #endif > __cv.wait_until(__mx, __ts); > > + return __wait_clock_t::now() < __atime; > } > #endif // _GLIBCXX_HAS_GTHREADS > > + inline __wait_result_type > + __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args, > + const __wait_clock_t::time_point& __deadline) > { > > + auto __t0 = __wait_clock_t::now(); > + using namespace literals::chrono_literals; > + > + __platform_wait_t __val; > + auto __now = __wait_clock_t::now(); > + for (; __now < __deadline; __now = __wait_clock_t::now()) > + { > + auto __elapsed = __now - __t0; > +#ifndef _GLIBCXX_NO_SLEEP > + if (__elapsed > 128ms) > + { > + this_thread::sleep_for(64ms); > + } > + else if (__elapsed > 64us) > + { > + this_thread::sleep_for(__elapsed / 2); > + } > + else > +#endif > + if (__elapsed > 4us) > + { > + __thread_yield(); > + } > + else > + { > + auto __res = __detail::__spin_impl(__addr, __args); > + if (__res.first) > + return __res; > + } > + > + __atomic_load(__addr, &__val, __args._M_order); > + if (__val != __args._M_old) > + return make_pair(true, __val); > + } > + return make_pair(false, __val); > + } > + > + inline __wait_result_type > + __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args, > + const __wait_clock_t::time_point& __atime) > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT > + __waiter_pool_impl* __pool = nullptr; > +#else > + // if we don't have __platform_wait, we always need the side-table > + __waiter_pool_impl* __pool = > &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + > + __platform_wait_t* __wait_addr; > + if (__args & __wait_flags::__proxy_wait) > { > #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT > > > + __pool = &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + __wait_addr = &__pool->_M_ver; > + __atomic_load(__wait_addr, &__args._M_old, __args._M_order); > } > > + else > + __wait_addr = const_cast<__platform_wait_t*>(__addr); > > + if (__args & __wait_flags::__do_spin) > + { > + auto __res = __detail::__spin_until_impl(__wait_addr, __args, > __atime); > + if (__res.first) > + return __res; > + if (__args & __wait_flags::__spin_only) > + return __res; > As observed by Nate Eldredge, this early return is always taken, even if the caller didn't set __spin_only. That's because args & __spin_only is args & 12 which is true if do_spin is set (which it must be or we wouldn't have entered this block). That means that any call to __wait_impl with __do_spin set in the flags (which is in the default flags) will return before ever waiting, resulting in the caller doing a busy loop. + } > > + if (!(__args & __wait_flags::__track_contention)) > This condition seems backwards. We want to track contention (i.e. use _M_enter_wait and _M_leave_wait) when the flag is set, but this does it when it's not set. { > > > + // caller does not externally track contention > +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT > + __pool = (__pool == nullptr) ? > &__waiter_pool_impl::_S_impl_for(__addr) > + : __pool; > This would be simpler as just: if (!__pool) __pool = ...; > +#endif > + __pool->_M_enter_wait(); > } > > + __wait_result_type __res; > +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT > + if (__platform_wait_until(__wait_addr, __args._M_old, __atime)) > + __res = make_pair(true, __args._M_old); > + else > + __res = make_pair(false, __args._M_old); > +#else > + __platform_wait_t __val; > + __atomic_load(__wait_addr, &__val, __args._M_order); > + if (__val == __args._M_old) > + { > + lock_guard __l{ __pool->_M_mtx }; > + __atomic_load(__wait_addr, &__val, __args._M_order); > + if (__val == __args._M_old && > + __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime)) > + __res = make_pair(true, __val); > There's a missing 'else' here, so we don't set anything in __res in this case. I think it probably wants to be {false, __val} ? + } > + else > + __res = make_pair(false, __val); > +#endif > + > + if (!(__args & __wait_flags::__track_contention)) > + // caller does not externally track contention > + __pool->_M_leave_wait(); > + return __res; > + } > + > + template > + __wait_result_type > + __wait_until(const __platform_wait_t* __addr, __wait_args __args, > + const chrono::time_point<_Clock, _Dur>& __atime) > noexcept > { > > + if constexpr (is_same_v<__wait_clock_t, _Clock>) > + return __detail::__wait_until_impl(__addr, __args, __atime); > + else > { > > + auto __res = __detail::__wait_until_impl(__addr, __args, > + __to_wait_clock(__atime)); > + if (!__res.first) > + { > + // We got a timeout when measured against __clock_t but > + // we need to check against the caller-supplied clock > + // to tell whether we should return a timeout. > + if (_Clock::now() < __atime) > + return make_pair(true, __res.second); > + } > + return __res; > + } > + } > > > + template > + __wait_result_type > + __wait_for(const __platform_wait_t* __addr, __wait_args __args, > + const chrono::duration<_Rep, _Period>& __rtime) noexcept > + { > + if (!__rtime.count()) > + // no rtime supplied, just spin a bit > + return __detail::__wait_impl(__addr, __args | > __wait_flags::__spin_only); > This should set __do_spin | __spin_only if the latter no longer implies the former. > + auto const __reltime = > chrono::ceil<__wait_clock_t::duration>(__rtime); > + auto const __atime = chrono::steady_clock::now() + __reltime; > + return __detail::__wait_until(__addr, __args, __atime); > + } > } // namespace __detail > > // returns true if wait ended before timeout > + template + typename _Pred, typename _ValFn, > + typename _Clock, typename _Dur> > + bool > + __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred, > + _ValFn&& __vfn, > + const chrono::time_point<_Clock, _Dur>& > __atime, > + bool __bare_wait = false) noexcept > + { > + const auto __wait_addr = > + reinterpret_cast(__addr); > + __detail::__wait_args __args{ __addr, __bare_wait }; > + _Tp __val = __vfn(); > + while (!__pred(__val)) > + { > + auto __res = __detail::__wait_until(__wait_addr, __args, > __atime); > + if (!__res.first) > + // timed out > + return __res.first; // C++26 will also return last observed > __val > + __val = __vfn(); > + } > + return true; // C++26 will also return last observed __val > + } > + > + template > + bool > + __atomic_wait_address_until_v(const __detail::__platform_wait_t* > __addr, > + __detail::__platform_wait_t __old, > + int __order, > + const chrono::time_point<_Clock, _Dur>& > __atime, > + bool __bare_wait = false) noexcept > + { > + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; > + auto __res = __detail::__wait_until(__addr, __args, __atime); > + return __res.first; // C++26 will also return last observed __val > + } > + > template typename _Clock, typename _Dur> > bool > __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, > _ValFn&& __vfn, > - const chrono::time_point<_Clock, _Dur>& > - __atime) noexcept > + const chrono::time_point<_Clock, _Dur>& > __atime, > + bool __bare_wait = false) noexcept > { > - __detail::__enters_timed_wait __w{__addr}; > - return __w._M_do_wait_until_v(__old, __vfn, __atime); > + auto __pfn = [&](const _Tp& __val) > + { return !__detail::__atomic_compare(__old, __val); }; > + return __atomic_wait_address_until(__addr, __pfn, > forward<_ValFn>(__vfn), > + __atime, __bare_wait); > } > > - template - typename _Clock, typename _Dur> > - bool > - __atomic_wait_address_until(const _Tp* __addr, _Pred __pred, > - const chrono::time_point<_Clock, _Dur>& > - __atime) > noexcept > - { > - __detail::__enters_timed_wait __w{__addr}; > - return __w._M_do_wait_until(__pred, __atime); > - } > + template + typename _Pred, typename _ValFn, > + typename _Rep, typename _Period> > + bool > + __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred, > + _ValFn&& __vfn, > + const chrono::duration<_Rep, _Period>& > __rtime, > + bool __bare_wait = false) noexcept > + { > + const auto __wait_addr = > + reinterpret_cast(__addr); > + __detail::__wait_args __args{ __addr, __bare_wait }; > + _Tp __val = __vfn(); > + while (!__pred(__val)) > + { > + auto __res = __detail::__wait_for(__wait_addr, __args, __rtime); > + if (!__res.first) > + // timed out > + return __res.first; // C++26 will also return last observed > __val > + __val = __vfn(); > + } > + return true; // C++26 will also return last observed __val > + } > > - template - typename _Clock, typename _Dur> > + template > bool > - __atomic_wait_address_until_bare(const __detail::__platform_wait_t* > __addr, > - _Pred __pred, > - const chrono::time_point<_Clock, _Dur>& > - __atime) > noexcept > - { > - __detail::__bare_timed_wait __w{__addr}; > - return __w._M_do_wait_until(__pred, __atime); > - } > + __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr, > + __detail::__platform_wait_t __old, > + int __order, > + const chrono::time_point<_Rep, _Period>& > __rtime, > + bool __bare_wait = false) noexcept > + { > + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; > + auto __res = __detail::__wait_for(__addr, __args, __rtime); > + return __res.first; // C++26 will also return last observed __Val > + } > > template typename _Rep, typename _Period> > bool > __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& > __vfn, > - const chrono::duration<_Rep, _Period>& __rtime) > noexcept > + const chrono::duration<_Rep, _Period>& > __rtime, > + bool __bare_wait = false) noexcept > { > - __detail::__enters_timed_wait __w{__addr}; > - return __w._M_do_wait_for_v(__old, __vfn, __rtime); > - } > - > - template - typename _Rep, typename _Period> > - bool > - __atomic_wait_address_for(const _Tp* __addr, _Pred __pred, > - const chrono::duration<_Rep, _Period>& __rtime) > noexcept > - { > - > - __detail::__enters_timed_wait __w{__addr}; > - return __w._M_do_wait_for(__pred, __rtime); > - } > - > - template - typename _Rep, typename _Period> > - bool > - __atomic_wait_address_for_bare(const __detail::__platform_wait_t* > __addr, > - _Pred __pred, > - const chrono::duration<_Rep, _Period>& __rtime) > noexcept > - { > - __detail::__bare_timed_wait __w{__addr}; > - return __w._M_do_wait_for(__pred, __rtime); > + auto __pfn = [&](const _Tp& __val) > + { return !__detail::__atomic_compare(__old, __val); }; > + return __atomic_wait_address_for(__addr, __pfn, > forward<_ValFn>(__vfn), > + __rtime, __bare_wait); > } > _GLIBCXX_END_NAMESPACE_VERSION > } // namespace std > diff --git a/libstdc++-v3/include/bits/atomic_wait.h > b/libstdc++-v3/include/bits/atomic_wait.h > index 506b7f0cc9d..63d132fc9a8 100644 > --- a/libstdc++-v3/include/bits/atomic_wait.h > +++ b/libstdc++-v3/include/bits/atomic_wait.h > @@ -47,7 +47,8 @@ > # include > #endif > > -# include // std::mutex, std::__condvar > +#include > +#include // std::mutex, std::__condvar > > namespace std _GLIBCXX_VISIBILITY(default) > { > @@ -131,7 +132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > __thread_yield() noexcept > { > #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD > - __gthread_yield(); > + __gthread_yield(); > #endif > } > > @@ -188,7 +157,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0; > } > > - struct __waiter_pool_base > + struct __waiter_pool_impl > Would it make sense to call this type "__wait_proxy" or something like that? "Waiter pool impl" doesn't really tell me much, where is the waiter pool that this is implementing? There used to be __waiter_pool and __waiter_pool_base and __waiter, but now there's just __waiter_pool_impl which isn't very descriptive in isolation. > { > // Don't use std::hardware_destructive_interference_size here > because we > // don't want the layout of library types to depend on compiler > options. > @@ -205,7 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT > __condvar _M_cv; > #endif > - __waiter_pool_base() = default; > + __waiter_pool_impl() = default; > > void > _M_enter_wait() noexcept > > - > - static __waiter_pool_base& > - _S_for(const void* __addr) noexcept > + static __waiter_pool_impl& > + _S_impl_for(const void* __addr) noexcept > { > constexpr uintptr_t __ct = 16; > - static __waiter_pool_base __w[__ct]; > + static __waiter_pool_impl __w[__ct]; > auto __key = (uintptr_t(__addr) >> 2) % __ct; > return __w[__key]; > } > }; > > + enum class __wait_flags : uint32_t > { > > + __abi_version = 0, > + __proxy_wait = 1, > + __track_contention = 2, > + __do_spin = 4, > + __spin_only = 8 | __do_spin, // implies __do_spin > This should be just 8 and not also imply __do_spin. > + __abi_version_mask = 0xffff0000, > }; > > - template > - struct __waiter_base > + struct __wait_args > + { > + __platform_wait_t _M_old; > + int _M_order = __ATOMIC_ACQUIRE; > + __wait_flags _M_flags; > + > + template > + explicit __wait_args(const _Tp* __addr, > + bool __bare_wait = false) noexcept > + : _M_flags{ _S_flags_for(__addr, __bare_wait) } > + { } > + > + __wait_args(const __platform_wait_t* __addr, __platform_wait_t > __old, > + int __order, bool __bare_wait = false) noexcept > + : _M_old{ __old } > + , _M_order{ __order } > + , _M_flags{ _S_flags_for(__addr, __bare_wait) } > + { } > + > + __wait_args(const __wait_args&) noexcept = default; > + __wait_args& > + operator=(const __wait_args&) noexcept = default; > + > + bool > + operator&(__wait_flags __flag) const noexcept > { > > > + using __t = underlying_type_t<__wait_flags>; > + return static_cast<__t>(_M_flags) > + & static_cast<__t>(__flag); > + } > > + __wait_args > + operator|(__wait_flags __flag) const noexcept > + { > + using __t = underlying_type_t<__wait_flags>; > + __wait_args __res{ *this }; > + const auto __flags = static_cast<__t>(__res._M_flags) > + | static_cast<__t>(__flag); > + __res._M_flags = __wait_flags{ __flags }; > + return __res; > + } > Seems like it would be clearer to define operator| for the actual enum type, so that these member functions don't need to repeat all the casting logic. > + private: > + static int > + constexpr _S_default_flags() noexcept > + { > + using __t = underlying_type_t<__wait_flags>; > + return static_cast<__t>(__wait_flags::__abi_version) > + | static_cast<__t>(__wait_flags::__do_spin); > + } > > + template > + static int > + constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept > { > > + auto __res = _S_default_flags(); > + if (!__bare_wait) > + __res |= static_cast(__wait_flags::__track_contention); > + if constexpr (!__platform_wait_uses_type<_Tp>) > + __res |= static_cast(__wait_flags::__proxy_wait); > + return __res; > } > + template > + static int > + _S_memory_order_for(const _Tp*, int __order) noexcept > { > > + if constexpr (__platform_wait_uses_type<_Tp>) > + return __order; > + return __ATOMIC_ACQUIRE; > + } > + }; > This function is never used. Should it be? Is the idea to force the use of acquire ordering when waiting on a proxy instead of waiting on the atomic variable directly? Is that done elsewhere? Can this function be removed, or should it be used somewhere? > + > + using __wait_result_type = pair; > + inline __wait_result_type > + __spin_impl(const __platform_wait_t* __addr, __wait_args __args) > + { > + __platform_wait_t __val; > + for (auto __i = 0; __i < __atomic_spin_count; ++__i) > + { > + __atomic_load(__addr, &__val, __args._M_order); > + if (__val != __args._M_old) > + return make_pair(true, __val); > + if (__i < __atomic_spin_count_relax) > + __detail::__thread_relax(); > + else > + __detail::__thread_yield(); > + } > + return make_pair(false, __val); > Every use of make_pair can be replaced by just {x, y} to avoid making a function call (which will compile faster too). > + } > + > + inline __wait_result_type > + __wait_impl(const __platform_wait_t* __addr, __wait_args __args) > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __waiter_pool_impl* __pool = nullptr; > +#else > + // if we don't have __platform_wait, we always need the side-table > + __waiter_pool_impl* __pool = > &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + > + __platform_wait_t* __wait_addr; > This could be const __platform_wait_t* > + if (__args & __wait_flags::__proxy_wait) > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __pool = &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + __wait_addr = &__pool->_M_ver; > + __atomic_load(__wait_addr, &__args._M_old, __args._M_order); > + } > + else > + __wait_addr = const_cast<__platform_wait_t*>(__addr); > + > + if (__args & __wait_flags::__do_spin) > + { > + auto __res = __detail::__spin_impl(__wait_addr, __args); > + if (__res.first) > + return __res; > + if (__args & __wait_flags::__spin_only) > + return __res; > As above, this early return is always taken, resulting in busy loops. } > > + if (!(__args & __wait_flags::__track_contention)) > + { > + // caller does not externally track contention > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __pool = (__pool == nullptr) ? > &__waiter_pool_impl::_S_impl_for(__addr) > + : __pool; > if (!pool) pool = ...; > +#endif > + __pool->_M_enter_wait(); > + } > > + __wait_result_type __res; > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __platform_wait(__wait_addr, __args._M_old); > + __res = make_pair(false, __args._M_old); > +#else > + __platform_wait_t __val; > + __atomic_load(__wait_addr, &__val, __args._M_order); > + if (__val == __args._M_old) > + { > + lock_guard __l{ __pool->_M_mtx }; > + __atomic_load(__wait_addr, &__val, __args._M_order); > + if (__val == __args._M_old) > + __pool->_M_cv.wait(__pool->_M_mtx); > + } > + __res = make_pair(false, __val); > +#endif > > - using __enters_wait = __waiter; > - using __bare_wait = __waiter; > + if (!(__args & __wait_flags::__track_contention)) > Condition is backwards? > + // caller does not externally track contention > + __pool->_M_leave_wait(); > This _M_leave_wait() could be done by an RAII type's destructor instead. > + return __res; > + } > + > + inline void > + __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool > __all, > + __wait_args __args) > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __waiter_pool_impl* __pool = nullptr; > +#else > + // if we don't have __platform_notify, we always need the side-table > + __waiter_pool_impl* __pool = > &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + > + if (!(__args & __wait_flags::__track_contention)) > Condition is backwards? > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __pool = &__waiter_pool_impl::_S_impl_for(__addr); > +#endif > + if (!__pool->_M_waiting()) > + return; > + } > + > + __platform_wait_t* __wait_addr; > + if (__args & __wait_flags::__proxy_wait) > + { > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __pool = (__pool == nullptr) ? > &__waiter_pool_impl::_S_impl_for(__addr) > + : __pool; > if (!pool) pool = ...; > +#endif > + __wait_addr = &__pool->_M_ver; > + __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED); > + __all = true; > + } > For a non-proxied wait, __wait_addr is uninitialized here. So the futex wait uses a random address. There needs to be `else __wait_addr = __addr;` + > +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > + __platform_notify(__wait_addr, __all); > +#else > + lock_guard __l{ __pool->_M_mtx }; > + __pool->_M_cv.notify_all(); > +#endif > + } > } // namespace __detail > > + template + typename _Pred, typename _ValFn> > + void > + __atomic_wait_address(const _Tp* __addr, > + _Pred&& __pred, _ValFn&& __vfn, > + bool __bare_wait = false) noexcept > + { > + const auto __wait_addr = > + reinterpret_cast(__addr); > + __detail::__wait_args __args{ __addr, __bare_wait }; > + _Tp __val = __vfn(); > + while (!__pred(__val)) > + { > As observed by Nate, the old value is never set in the __args, so the wait always uses 0 (the value set in the __wait_args constructor). It needs: if constexpr (__platform_wait_uses_type<_Tp>) __args._M_old = __builtin_bit_cast(__platform_wait_t, __val); For proxied waits _M_old isn't used, pool->_M_ver is used instead. > + __detail::__wait_impl(__wait_addr, __args); > + __val = __vfn(); > + } > + // C++26 will return __val > + } > + > + inline void > + __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, > + __detail::__platform_wait_t __old, > + int __order) > + { > + __detail::__wait_args __args{ __addr, __old, __order }; > + // C++26 will not ignore the return value here > + __detail::__wait_impl(__addr, __args); > + } > + > template > void > __atomic_wait_address_v(const _Tp* __addr, _Tp __old, > _ValFn __vfn) noexcept > { > diff --git a/libstdc++-v3/include/std/latch > b/libstdc++-v3/include/std/latch > index 27cd80d7100..525647c1701 100644 > --- a/libstdc++-v3/include/std/latch > +++ b/libstdc++-v3/include/std/latch > @@ -74,8 +74,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > _GLIBCXX_ALWAYS_INLINE void > wait() const noexcept > { > - auto const __pred = [this] { return this->try_wait(); }; > - std::__atomic_wait_address(&_M_a, __pred); > + auto const __vfn = [this] { return this->try_wait(); }; > + auto const __pred = [this](bool __b) { return __b; }; > This second lambda doesn't need to capture 'this'. > + std::__atomic_wait_address(&_M_a, __pred, __vfn); > } > > _GLIBCXX_ALWAYS_INLINE void > > --000000000000b692890613395096--