C++2a synchronisation inefficient in GCC 11

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

* C++2a synchronisation inefficient in GCC 11
@ 2021-02-25 22:50 Thiago Macieira
  2021-02-26 11:19 ` Jonathan Wakely
                   ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-25 22:50 UTC (permalink / raw)
  To: libstdc++; +Cc: trodgers, jwakely

Hello all

I was investigating the implementation of std::atomic<T>::wait to use inside 
my own code (Qt's QMutex and QSemaphore), when I realised that it has 
huge inefficiencies. Since everything in this implementation is inline and, 
once released, it will tie our hands until the next ABI break (libstdc++.so.
7). And no, inline namespaces and ABI tags are no different than ABI breaks, 
they only make the ABI break more or less silent in the process.

Here's a summary of the findings:

 1) everything is inline
 2) futex code is still behind a lot of code calling into std::_Hash_bytes
 3) other int-sized (incl. enums) atomics don't use futex
 4) std::latch and std::counting_semaphore defaults preclude from using
    futex on Linux
 5) std::barrier implementation also uses a type that futex(2) can't handle

Here's what the code currently looks like: https://gcc.godbolt.org/z/MeP8rr

Given how far we are in the release cycle, a simple solution is to simply 
disable the entire solution by #if 0'ing out bits/atomic_wait.h. If 
__cpp_lib_atomic_wait isn't defined, the rest of the functionality cascades 
disabled too. Then we can spend however long is necessary to get a good 
implementation for GCC 12.

I have patches for all but the first issue, but they are a result of only 3 
hours of hacking to prove the concept. I will post them, but I do not expect 
them to get merged as-is.

Details:

1) everything is inline

I think this is unnecessary optimisation, especially since it expands to a LOT 
of code (see link above). A cursory research into libc++ and MS STL shows that 
the actual wait and notification are out-of-line. I recommend we do the same 
for libstdc++: if we're going to syscall anyway, the extra function call isn't 
going to be the performance bottleneck.

In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, Jonathan 
wrote:
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.

But that has never happened.

Having out-of-line implementations would allow the implementation for generic-
sized atomic wait/notify to fine-tune as time and experience progresses. The 
std::__detail::__waiter implementation looks clever, but it's unclear if it is 
the best for all workloads. Moreover, the implementation is limited to 16 
waiters.

Moreover, it would allow us to have support in the future for a 64-bit futex. 
Linus did shoot down any need for it 13 years ago, but the world has changed 
since and it's not inconceivable to have a pointer-sized futex on every 
platform, which on 64-bit would be 64-bit. Windows and Darwin do have support 
for 64-bit compare-and-wait functionality.

And it would allow us to have futex-equivalent support in other OSes, added at 
our leisure. The current implementation does not support Windows's 
WaitOnAddress and could never support it if released in the current state. 
It's also possible other OSes will eventually add equivalent functionality 
because of the C++ standard. For example, FreeBSD already has futex() support 
inside its kernel in order to run Linux binaries, but it is not exposed to the 
FreeBSD-personality syscall ABI.

Inlining certain functionality at a later date is possible. De-inlining isn't.

2) futex code for int is still behind a lot of boiler plate

It comes from the creation/use of std::__detail::__waiter in 
std::__atomic_wait and __atomic_notify. That does allow one to detect whether 
there is contention in the first place and avoid the system call, but I think 
that's premature optimisation.

In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, Thomas 
says:
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are no
> waiters.

That's true, but it ignores the fact that users of std::atomic<T>::wait() are 
likely to track contention by themselves on the value of the atomic (I know I 
do). libc++, MS STL and the original atomic_wait[1] implementation operate on 
direct system calls and don't try to track contention. Because of that, 
libstdc++ adding an extra layer of tracking is double book-keeping and 
violates "don't pay for what you don't need".

[1] https://github.com/ogiroux/atomic_wait

3) other int-sized atomics don't use futex

The kernel doesn't care what we store in those 32 bits, only that they are 
comparable (FUTEX_OP does provide less-than and greater-than operations that 
are signed, but we don't use that and the kernel API can be later extended to 
unsigned compares). Therefore, I would recommend we extend futex support to 
all 32-bit signed scalar types. This would add:
 * pointers and long on 32-bit architectures
 * unsigned
 * untyped enums and typed enums on int & unsigned int
 * float

4) std::latch and std::counting_semaphore defaults too high

The standard requires a ptrdiff_t in the API, but it does allow us to set hard 
limits and/or defaults for best performance. Right now, std::latch is 
implemented using ptrdiff_t internally and std::counting_semaphore's default 
limit uses PTRDIFF_MAX:

  template<ptrdiff_t __least_max_value =
			__gnu_cxx::__int_traits<ptrdiff_t>::__max>
    class counting_semaphore

As a consequence, those two classes will use 64-bit atomics on 64-bit 
architectures, which means they will not use futex. I recommend changing 
std::latch's internal to int and lowering the default maximum in 
std::counting_semaphore to INT_MAX.

5) std::barrier uses a type that futex can't handle

The barrier phase is the actual type that gets waited / notified on, but it is 
an enum class on unsigned char:

  enum class __barrier_phase_t : unsigned char { };

This is a harder problem to solve, but I really do recommend making it 32-bit 
in width. Whether each barrier phase holds only 256 values or the full 32 
bits, I don't know (I've implemented with wrapping on 256).

What is of consequence is that the __state_t type could increase in size 
("could" because we can simply reduce the number of tickets in each state 
entry). I haven't done enough research on this to know whether it is the right 
way to go.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
@ 2021-02-26 11:19 ` Jonathan Wakely
  2021-02-26 17:37   ` Thiago Macieira
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
  2021-02-27  1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
  2 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 11:19 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, Thomas Rodgers, Jonathan Wakely

On Thu, 25 Feb 2021 at 23:28, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> Hello all
>
> I was investigating the implementation of std::atomic<T>::wait to use inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline and,
> once released, it will tie our hands until the next ABI break (libstdc++.so.

That's not true. By not committing to exporting these symbols from
libstdc++.so now, we can move them into the library later once the
design is stable.

C++20 support is experimental and not stable. If you compile it with
GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
So for GCC 12 (or a later version) we could replace the inline
functions with non-inline code in libstdc++.so and commit to that
being stable.

So actually this gives us more freedom to change it later.

> 7). And no, inline namespaces and ABI tags are no different than ABI breaks,
> they only make the ABI break more or less silent in the process.
>
> Here's a summary of the findings:
>
>  1) everything is inline
>  2) futex code is still behind a lot of code calling into std::_Hash_bytes
>  3) other int-sized (incl. enums) atomics don't use futex

Yup, including unsigned ints, and longs on ILP32 targets.

>  4) std::latch and std::counting_semaphore defaults preclude from using
>     futex on Linux

Yes, I've suggested that we should use INT_MAX (or maybe UINT_MAX) as
the default, so they fit in the futex range.

I'm not working today, so I'll read the rest and respond to the test
another time.

tl;dr this code is experimental and expected to change, so is inline
for a reason.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
  2021-02-26 11:19 ` Jonathan Wakely
@ 2021-02-26 15:59 ` Thiago Macieira
  2021-02-26 15:59   ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
                     ` (5 more replies)
  2021-02-27  1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
  2 siblings, 6 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
---
 libstdc++-v3/include/std/latch | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index ef8c301e5e9..156aea5c5e5 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public:
     static constexpr ptrdiff_t
     max() noexcept
-    { return __gnu_cxx::__int_traits<ptrdiff_t>::__max; }
+    { return __gnu_cxx::__int_traits<int>::__max; }
 
     constexpr explicit latch(ptrdiff_t __expected) noexcept
       : _M_a(__expected) { }
@@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
   private:
-    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
+    alignas(__alignof__(int)) int _M_a;
   };
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-- 
2.30.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
@ 2021-02-26 15:59   ` Thiago Macieira
  2021-03-03 14:34     ` Jonathan Wakely
  2021-02-26 15:59   ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

The kernel doesn't care what we store in those 32 bits, only that they are
comparable. This commit adds:
 * pointers and long on 32-bit architectures
 * unsigned
 * untyped enums and typed enums on int & unsigned int
 * float

We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
this field but has only used 6 values so far, so it can be extended to
unsigned compares in the future, if needed.
---
 libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 424fccbe4c5..1c6bda2e2b6 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     template<typename _Tp>
       inline constexpr bool __platform_wait_uses_type
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	= is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
+	= is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)
+	  && alignof(_Tp) >= alignof(__platform_wait_t);
 #else
 	= false;
 #endif
@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
     template<typename _Tp>
       void
-      __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
+      __platform_wait(const _Tp* __addr, _Tp __val) noexcept
       {
 	for(;;)
 	  {
 	    auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
 				  static_cast<int>(__futex_wait_flags::__wait_private),
-				    __val, nullptr);
+				  static_cast<__platform_wait_t>(__val), nullptr);
 	    if (!__e || errno == EAGAIN)
 	      break;
 	    else if (errno != EINTR)
-- 
2.30.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
  2021-02-26 15:59   ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
@ 2021-02-26 15:59   ` Thiago Macieira
  2021-02-26 15:59   ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

That violates "don't pay for what you don't need" rule of C++. Users of
std::atomic<T>::wait will often have some bit in their atomic indicate
whether the value is contended or not, so we don't need libstdc++ to do
double book-keeping for us.
---
 libstdc++-v3/include/bits/atomic_wait.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 1c6bda2e2b6..4d240f44faf 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -258,14 +258,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (std::__atomic_spin(__pred))
 	return;
 
-      __waiter __w(__addr);
-      while (!__pred())
+      if constexpr (__platform_wait_uses_type<_Tp>)
 	{
-	  if constexpr (__platform_wait_uses_type<_Tp>)
-	    {
-	      __platform_wait(__addr, __old);
-	    }
-	  else
+	  __platform_wait(__addr, __old);
+	}
+      else
+	{
+	  __waiter __w(__addr);
+	  while (!__pred())
 	    {
 	      // TODO support timed backoff when this can be moved into the lib
 	      __w._M_do_wait();
@@ -274,13 +274,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
   template<typename _Tp>
-    void
+    inline void
     __atomic_notify(const _Tp* __addr, bool __all) noexcept
     {
       using namespace __detail;
-      auto& __w = __waiters::_S_for((void*)__addr);
-      if (!__w._M_waiting())
-	return;
 
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
       if constexpr (__platform_wait_uses_type<_Tp>)
@@ -290,6 +287,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       else
 #endif
 	{
+	  auto& __w = __waiters::_S_for((void*)__addr);
+	  if (!__w._M_waiting())
+	    return;
 	  __w._M_notify(__all);
 	}
     }
-- 
2.30.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
  2021-02-26 15:59   ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
  2021-02-26 15:59   ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
@ 2021-02-26 15:59   ` Thiago Macieira
  2021-02-28 15:05     ` Hans-Peter Nilsson
  2021-02-26 15:59   ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

ints can be used in futexes. chars can't.
---
 libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index e09212dfcb9..ae058bd3dc3 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -70,7 +70,7 @@ It looks different from literature pseudocode for two main reasons:
 
 */
 
-  enum class __barrier_phase_t : unsigned char { };
+  enum class __barrier_phase_t : int { };
 
   template<typename _CompletionF>
     class __tree_barrier
@@ -93,20 +93,24 @@ It looks different from literature pseudocode for two main reasons:
 
       alignas(__phase_alignment) __barrier_phase_t  _M_phase;
 
+      static __barrier_phase_t
+      _S_add_to_phase(__barrier_phase_t __phase, unsigned char __n)
+      {
+        __n += static_cast<unsigned char>(__phase);
+        return static_cast<__barrier_phase_t>(__n);
+      }
+
       bool
       _M_arrive(__barrier_phase_t __old_phase)
       {
-	const auto __old_phase_val = static_cast<unsigned char>(__old_phase);
-	const auto __half_step =
-			   static_cast<__barrier_phase_t>(__old_phase_val + 1);
-	const auto __full_step =
-			   static_cast<__barrier_phase_t>(__old_phase_val + 2);
-
 	size_t __current_expected = _M_expected;
 	std::hash<std::thread::id> __hasher;
 	size_t __current = __hasher(std::this_thread::get_id())
 					  % ((_M_expected + 1) >> 1);
 
+	const auto __half_step = _S_add_to_phase(__old_phase, 1);
+	const auto __full_step = _S_add_to_phase(__old_phase, 2);
+
 	for (int __round = 0; ; ++__round)
 	  {
 	    if (__current_expected <= 1)
@@ -165,7 +169,6 @@ It looks different from literature pseudocode for two main reasons:
       {
 	__atomic_phase_ref_t __phase(_M_phase);
 	const auto __old_phase = __phase.load(memory_order_relaxed);
-	const auto __cur = static_cast<unsigned char>(__old_phase);
 	for(; __update; --__update)
 	  {
 	    if(_M_arrive(__old_phase))
@@ -173,7 +176,7 @@ It looks different from literature pseudocode for two main reasons:
 		_M_completion();
 		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
 		_M_expected_adjustment.store(0, memory_order_relaxed);
-		auto __new_phase = static_cast<__barrier_phase_t>(__cur + 2);
+		auto __new_phase = _S_add_to_phase(__old_phase, 2);
 		__phase.store(__new_phase, memory_order_release);
 		__phase.notify_all();
 	      }
-- 
2.30.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 5/5] barrier: optimise by not having the hasher in a loop
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
                     ` (2 preceding siblings ...)
  2021-02-26 15:59   ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
@ 2021-02-26 15:59   ` Thiago Macieira
  2021-03-03 14:36     ` Jonathan Wakely
  2021-02-26 18:14   ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
  2021-03-03 14:34   ` Jonathan Wakely
  5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

Our thread's ID does not change so we don't have to get it every time
and hash it every time.
---
 libstdc++-v3/include/std/barrier | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index ae058bd3dc3..eb31a89b175 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -101,12 +101,10 @@ It looks different from literature pseudocode for two main reasons:
       }
 
       bool
-      _M_arrive(__barrier_phase_t __old_phase)
+      _M_arrive(__barrier_phase_t __old_phase, size_t __current)
       {
 	size_t __current_expected = _M_expected;
-	std::hash<std::thread::id> __hasher;
-	size_t __current = __hasher(std::this_thread::get_id())
-					  % ((_M_expected + 1) >> 1);
+        __current %= ((__current_expected + 1) >> 1);
 
 	const auto __half_step = _S_add_to_phase(__old_phase, 1);
 	const auto __full_step = _S_add_to_phase(__old_phase, 2);
@@ -167,11 +165,13 @@ It looks different from literature pseudocode for two main reasons:
       [[nodiscard]] arrival_token
       arrive(ptrdiff_t __update)
       {
+	std::hash<std::thread::id> __hasher;
+	size_t __current = __hasher(std::this_thread::get_id());
 	__atomic_phase_ref_t __phase(_M_phase);
 	const auto __old_phase = __phase.load(memory_order_relaxed);
 	for(; __update; --__update)
 	  {
-	    if(_M_arrive(__old_phase))
+	    if(_M_arrive(__old_phase, __current))
 	      {
 		_M_completion();
 		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 11:19 ` Jonathan Wakely
@ 2021-02-26 17:37   ` Thiago Macieira
  2021-02-26 18:29     ` Jonathan Wakely
  2021-02-26 18:47     ` Ville Voutilainen
  0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 17:37 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, Thomas Rodgers

On Friday, 26 February 2021 03:19:02 PST Jonathan Wakely wrote:
> > has huge inefficiencies. Since everything in this implementation is
> > inline and, once released, it will tie our hands until the next ABI break
> > (libstdc++.so.
> That's not true. By not committing to exporting these symbols from
> libstdc++.so now, we can move them into the library later once the
> design is stable.

Hello Jonathan

Thank you for replying.

The problem is not the export or not export, it's that the protocol/method is 
locked down. We cannot change it in a future version, like changing the number 
of __waiter objects in the __waiters::_S_for function. Since that code gets 
inlined into every executable that uses it, future versions are bound to use 
the exact same selection until we break ABI.

> C++20 support is experimental and not stable. If you compile it with
> GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
> So for GCC 12 (or a later version) we could replace the inline
> functions with non-inline code in libstdc++.so and commit to that
> being stable.

Oh? That's not a really widely understood fact. I see people using C++20 right 
now all the time in the cpplang slack and no one seems to be worried that 
support is experimental.

Still, what I'm proposing is that we minimise the fallout right now.

> >  4) std::latch and std::counting_semaphore defaults preclude from using
> >     futex on Linux
> 
> Yes, I've suggested that we should use INT_MAX (or maybe UINT_MAX) as
> the default, so they fit in the futex range.

Hmm... I didn't test increasing to UINT_MAX after my change to 
__platform_wait_uses_type. I guess it should just work.

> I'm not working today, so I'll read the rest and respond to the test
> another time.
> 
> tl;dr this code is experimental and expected to change, so is inline
> for a reason.

Thanks, but I still propose we hide it better.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
                     ` (3 preceding siblings ...)
  2021-02-26 15:59   ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
@ 2021-02-26 18:14   ` Andreas Schwab
  2021-02-26 19:08     ` Thiago Macieira
  2021-03-03 14:34   ` Jonathan Wakely
  5 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-02-26 18:14 UTC (permalink / raw)
  To: Thiago Macieira via Gcc-patches; +Cc: libstdc++, Thiago Macieira

On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:

> @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      }
>  
>    private:
> -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> +    alignas(__alignof__(int)) int _M_a;

Futexes must be aligned to 4 bytes.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 17:37   ` Thiago Macieira
@ 2021-02-26 18:29     ` Jonathan Wakely
  2021-02-26 19:30       ` Ville Voutilainen
  2021-02-26 18:47     ` Ville Voutilainen
  1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 18:29 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On 26/02/21 09:37 -0800, Thiago Macieira via Libstdc++ wrote:
>On Friday, 26 February 2021 03:19:02 PST Jonathan Wakely wrote:
>> > has huge inefficiencies. Since everything in this implementation is
>> > inline and, once released, it will tie our hands until the next ABI break
>> > (libstdc++.so.
>> That's not true. By not committing to exporting these symbols from
>> libstdc++.so now, we can move them into the library later once the
>> design is stable.
>
>Hello Jonathan
>
>Thank you for replying.
>
>The problem is not the export or not export, it's that the protocol/method is
>locked down. We cannot change it in a future version, like changing the number
>of __waiter objects in the __waiters::_S_for function. Since that code gets
>inlined into every executable that uses it, future versions are bound to use
>the exact same selection until we break ABI.

Compiling C++20 code with GCC 11 and linking to C++20 code compiled
with GCC != 11 is unsupported. The ABI is expected to break for C++20
features.

We can change it. We WILL change it.

The world is not ending.

>> C++20 support is experimental and not stable. If you compile it with
>> GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
>> So for GCC 12 (or a later version) we could replace the inline
>> functions with non-inline code in libstdc++.so and commit to that
>> being stable.
>
>Oh? That's not a really widely understood fact. I see people using C++20 right
>now all the time in the cpplang slack and no one seems to be worried that
>support is experimental.

I can't really help it if people don't read documentation for
important tools they rely on.

https://gcc.gnu.org/projects/cxx-status.html#cxx20

https://gcc.gnu.org/gcc-8/changes.html#libstdcxx
https://gcc.gnu.org/gcc-9/changes.html#libstdcxx
https://gcc.gnu.org/gcc-10/changes.html#libstdcxx
https://gcc.gnu.org/gcc-11/changes.html#libstdcxx


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 17:37   ` Thiago Macieira
  2021-02-26 18:29     ` Jonathan Wakely
@ 2021-02-26 18:47     ` Ville Voutilainen
  2021-02-26 23:53       ` Thiago Macieira
  1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 18:47 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> Oh? That's not a really widely understood fact. I see people using C++20 right
> now all the time in the cpplang slack and no one seems to be worried that
> support is experimental.

People writing toys on forums is not real production use that we
should worry about, and anyone who uses
a two-months-old standard in production use is going to survive
possible breaking changes. Or if they won't,
that's their stupid fault.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-26 18:14   ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
@ 2021-02-26 19:08     ` Thiago Macieira
  2021-02-26 19:31       ` Andreas Schwab
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 19:08 UTC (permalink / raw)
  To: Thiago Macieira via Gcc-patches, Andreas Schwab; +Cc: libstdc++

On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > }
> > 
> > private:
> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > +    alignas(__alignof__(int)) int _M_a;
> 
> Futexes must be aligned to 4 bytes.

Agreed, but doesn't this accomplish that?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 18:29     ` Jonathan Wakely
@ 2021-02-26 19:30       ` Ville Voutilainen
  2021-02-26 21:17         ` Jonathan Wakely
  0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 19:30 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Thiago Macieira, Thomas Rodgers, libstdc++

On Fri, 26 Feb 2021 at 21:26, Jonathan Wakely via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> >Oh? That's not a really widely understood fact. I see people using C++20 right
> >now all the time in the cpplang slack and no one seems to be worried that
> >support is experimental.
>
> I can't really help it if people don't read documentation for
> important tools they rely on.
>
> https://gcc.gnu.org/projects/cxx-status.html#cxx20

Also, if it's not a sufficient hint that you need to explicitly tell
the compiler to enter C++20 mode, there's
nothing worth our time to protect such users from the perils of the
world. And trying to remain
ABI-compatible between experimental C++20 support and its subsequent
C++20 support absolutely isn't one
of the things worth our time, even for users that are not idiots. It's
experimental, and We Mean It.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-26 19:08     ` Thiago Macieira
@ 2021-02-26 19:31       ` Andreas Schwab
  2021-02-27  0:13         ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-02-26 19:31 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Thiago Macieira via Gcc-patches, libstdc++

On Feb 26 2021, Thiago Macieira wrote:

> On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>> > @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > }
>> > 
>> > private:
>> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>> > +    alignas(__alignof__(int)) int _M_a;
>> 
>> Futexes must be aligned to 4 bytes.
>
> Agreed, but doesn't this accomplish that?

No.  It uses whatever alignment the type already has, and is an
elaborate no-op.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 19:30       ` Ville Voutilainen
@ 2021-02-26 21:17         ` Jonathan Wakely
  2021-02-26 21:18           ` Ville Voutilainen
  0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 21:17 UTC (permalink / raw)
  To: Ville Voutilainen
  Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++

On Fri, 26 Feb 2021, 20:22 Ville Voutilainen via Libstdc++, <
libstdc++@gcc.gnu.org> wrote:

> On Fri, 26 Feb 2021 at 21:26, Jonathan Wakely via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
> > >Oh? That's not a really widely understood fact. I see people using
> C++20 right
> > >now all the time in the cpplang slack and no one seems to be worried
> that
> > >support is experimental.
> >
> > I can't really help it if people don't read documentation for
> > important tools they rely on.
> >
> > https://gcc.gnu.org/projects/cxx-status.html#cxx20
>
> Also, if it's not a sufficient hint that you need to explicitly tell
> the compiler to enter C++20 mode, there's
> nothing worth our time to protect such users from the perils of the
> world. And trying to remain
> ABI-compatible between experimental C++20 support and its subsequent
> C++20 support absolutely isn't one
> of the things worth our time, even for users that are not idiots. It's
> experimental, and We Mean It.
>

I've considered whether we should put things in an inline namespace until
they're stable, so that when we move it from std::__unstable to std::
there's an ABI break even if nothing else changes. Just to make it clear.

Maybe even a different namespace for each release. std::__gcc11, then
change it to __gcc12 just to make it incompatible.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 21:17         ` Jonathan Wakely
@ 2021-02-26 21:18           ` Ville Voutilainen
  2021-02-26 21:39             ` Jonathan Wakely
  0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 21:18 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++

On Fri, 26 Feb 2021 at 23:17, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:

> I've considered whether we should put things in an inline namespace until they're stable, so that when we move it from std::__unstable to std:: there's an ABI break even if nothing else changes. Just to make it clear.
>
> Maybe even a different namespace for each release. std::__gcc11, then change it to __gcc12 just to make it incompatible.

Sounds like overkill to me, just an additional burden to
remove/stabilize that inline namespacing in the happy event that we
don't
actually need to break anything.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 21:18           ` Ville Voutilainen
@ 2021-02-26 21:39             ` Jonathan Wakely
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 21:39 UTC (permalink / raw)
  To: Ville Voutilainen
  Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++

On Fri, 26 Feb 2021, 21:19 Ville Voutilainen, <ville.voutilainen@gmail.com>
wrote:

> On Fri, 26 Feb 2021 at 23:17, Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
> > I've considered whether we should put things in an inline namespace
> until they're stable, so that when we move it from std::__unstable to std::
> there's an ABI break even if nothing else changes. Just to make it clear.
> >
> > Maybe even a different namespace for each release. std::__gcc11, then
> change it to __gcc12 just to make it incompatible.
>
> Sounds like overkill to me, just an additional burden to
> remove/stabilize that inline namespacing in the happy event that we
> don't
> actually need to break anything.
>


Exactly why I've gone no further than considering it.

>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 18:47     ` Ville Voutilainen
@ 2021-02-26 23:53       ` Thiago Macieira
  2021-02-26 23:58         ` Ville Voutilainen
  2021-03-03 14:24         ` Jonathan Wakely
  0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 23:53 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Friday, 26 February 2021 10:47:09 PST Ville Voutilainen wrote:
> On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
> 
> <libstdc++@gcc.gnu.org> wrote:
> > Oh? That's not a really widely understood fact. I see people using C++20
> > right now all the time in the cpplang slack and no one seems to be
> > worried that support is experimental.
> 
> People writing toys on forums is not real production use that we
> should worry about, and anyone who uses
> a two-months-old standard in production use is going to survive
> possible breaking changes. Or if they won't,
> that's their stupid fault.

Think about this scenario. Let's say GCC 12 declares to C++20 support to be 
non-experimental.

Then someone writes a production tool that does:

#if __cpp_lib_atomic_wait >= 201907L
    a.wait(n);
#else
    fallback(a);
#endif

Should they also prevent people from compiling their code with GCC 11, which 
will define the macro and does support "-std=c++20"?

Is there a way to programmatically determine what level of the C++ Standard 
Library is officially supported so the requirement can be turned into a test?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 23:53       ` Thiago Macieira
@ 2021-02-26 23:58         ` Ville Voutilainen
  2021-02-27  0:11           ` Thiago Macieira
  2021-03-03 14:24         ` Jonathan Wakely
  1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 23:58 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Sat, 27 Feb 2021 at 01:53, Thiago Macieira <thiago.macieira@intel.com> wrote:
> Think about this scenario. Let's say GCC 12 declares to C++20 support to be
> non-experimental.
>
> Then someone writes a production tool that does:
>
> #if __cpp_lib_atomic_wait >= 201907L
>     a.wait(n);
> #else
>     fallback(a);
> #endif
>
> Should they also prevent people from compiling their code with GCC 11, which
> will define the macro and does support "-std=c++20"?

Yes, if there's an implementation bug in GCC 11 that renders the GCC
12 implementation incompatible.
This is not new.

> Is there a way to programmatically determine what level of the C++ Standard
> Library is officially supported so the requirement can be turned into a test?

No general one. Here we are talking about implementation bugs;
feature-testing macros are not
trying to cover those. And in the sunshine cases where we don't have
bugs or binary breaks,
defining the macros early is fine, so we're not going to postpone
defining them, in the hypothetical
event that someone might suggest that.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 23:58         ` Ville Voutilainen
@ 2021-02-27  0:11           ` Thiago Macieira
  2021-02-27  0:18             ` Ville Voutilainen
  2021-02-27  0:22             ` Marc Glisse
  0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27  0:11 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Friday, 26 February 2021 15:58:01 PST Ville Voutilainen wrote:
> > Should they also prevent people from compiling their code with GCC 11,
> > which will define the macro and does support "-std=c++20"?
> 
> Yes, if there's an implementation bug in GCC 11 that renders the GCC
> 12 implementation incompatible.
> This is not new.

One would expect that implementation bugs -- not design flaws -- do not affect 
the entire functionality. And in most inline code, that is not a big deal 
since the operation was still achieved one way or the other.

What we're talking about here makes the entire set of classes unsuitable for 
consumption and dangerous, with no way to programmatically determine when 
they've become suitable. My worry is not that someone will rush to use them in 
2021. My worry is that well-tested code in 2024-2025 breaks because people are 
using a mix of systems that includes Ubuntu 22.04.

> No general one. Here we are talking about implementation bugs;
> feature-testing macros are not
> trying to cover those. And in the sunshine cases where we don't have
> bugs or binary breaks,
> defining the macros early is fine, so we're not going to postpone
> defining them, in the hypothetical
> event that someone might suggest that.

Fair enough, but we know we are not in the sunshine case. We know now, before 
the release, that it WILL change and cause silent incompatibilities. Does that 
change the conclusion?

How about the other way around: only enable the new, experimental code if the 
user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-26 19:31       ` Andreas Schwab
@ 2021-02-27  0:13         ` Thiago Macieira
  2021-02-28 21:31           ` Hans-Peter Nilsson
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27  0:13 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches, libstdc++

On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> On Feb 26 2021, Thiago Macieira wrote:
> > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> >> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> >> > +    alignas(__alignof__(int)) int _M_a;
> >> 
> >> Futexes must be aligned to 4 bytes.
> > 
> > Agreed, but doesn't this accomplish that?
> 
> No.  It uses whatever alignment the type already has, and is an
> elaborate no-op.

I thought so too when I read the original line. But I expected it was written 
like that for a reason, especially since the same pattern appears in other 
places.

I can change to "alignas(4)" (which is a GCC extension, I believe). Is that 
the correct solution?


-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:11           ` Thiago Macieira
@ 2021-02-27  0:18             ` Ville Voutilainen
  2021-02-27  0:36               ` Thiago Macieira
  2021-02-27  0:22             ` Marc Glisse
  1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27  0:18 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Sat, 27 Feb 2021 at 02:11, Thiago Macieira <thiago.macieira@intel.com> wrote:
>
> On Friday, 26 February 2021 15:58:01 PST Ville Voutilainen wrote:
> > > Should they also prevent people from compiling their code with GCC 11,
> > > which will define the macro and does support "-std=c++20"?
> >
> > Yes, if there's an implementation bug in GCC 11 that renders the GCC
> > 12 implementation incompatible.
> > This is not new.
>
> One would expect that implementation bugs -- not design flaws -- do not affect
> the entire functionality. And in most inline code, that is not a big deal
> since the operation was still achieved one way or the other.
>
> What we're talking about here makes the entire set of classes unsuitable for
> consumption and dangerous, with no way to programmatically determine when
> they've become suitable. My worry is not that someone will rush to use them in
> 2021. My worry is that well-tested code in 2024-2025 breaks because people are
> using a mix of systems that includes Ubuntu 22.04.

There's nothing new in that either. I do, however, wonder how
dangerous using the current
implementation actually is? Anyhoo, almost regardless of that, we have
had situations like
that before. While we don't exactly want to repeat them (std::regex
in.. gcc8? comes to mind here),
I'm not entirely convinced that we are talking about a showstopper problem here.

> > No general one. Here we are talking about implementation bugs;
> > feature-testing macros are not
> > trying to cover those. And in the sunshine cases where we don't have
> > bugs or binary breaks,
> > defining the macros early is fine, so we're not going to postpone
> > defining them, in the hypothetical
> > event that someone might suggest that.
>
> Fair enough, but we know we are not in the sunshine case. We know now, before
> the release, that it WILL change and cause silent incompatibilities. Does that
> change the conclusion?

Do tell me what those silent incompatibilities really are. If they're
differences in inlined
code baked into client applications, I'm not sure I see what's so
earth-shattering there.

> How about the other way around: only enable the new, experimental code if the
> user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?

That seems drastic. We wouldn't get pretty much any use coverage of
any new code,
and that would just sweep the rug from under the whole idea of
feature-testing macros,
half of which is allowing bold users to adopt new facilities early.
Now they would need
to fall back to seventeen different implementation-specific ways to
opt in to that adoption,
which takes us back a decade. Please explain how your concerns about
atomic waits
are *anywhere near* the cost of doing that?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:11           ` Thiago Macieira
  2021-02-27  0:18             ` Ville Voutilainen
@ 2021-02-27  0:22             ` Marc Glisse
  2021-02-27  0:30               ` Ville Voutilainen
  2021-02-27  0:43               ` Thiago Macieira
  1 sibling, 2 replies; 73+ messages in thread
From: Marc Glisse @ 2021-02-27  0:22 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++

On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:

> How about the other way around: only enable the new, experimental code if the
> user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?

That's exactly what the user did, they passed -std=c++20...

Ok, it might help if the option was called -std=c++2a until it is stable, 
or if we defined a macro that advertises if we are in an experimental mode 
or not.

But ultimately, it is the user's responsibility to use working (not 
experimental) tools. And you can document that your library is only 
compatible with gcc-12+.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:22             ` Marc Glisse
@ 2021-02-27  0:30               ` Ville Voutilainen
  2021-02-27  0:43               ` Thiago Macieira
  1 sibling, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27  0:30 UTC (permalink / raw)
  To: libstdc++; +Cc: Thiago Macieira, Thomas Rodgers

On Sat, 27 Feb 2021 at 02:22, Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:
>
> > How about the other way around: only enable the new, experimental code if the
> > user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
>
> That's exactly what the user did, they passed -std=c++20...
>
> Ok, it might help if the option was called -std=c++2a until it is stable,
> or if we defined a macro that advertises if we are in an experimental mode
> or not.
>
> But ultimately, it is the user's responsibility to use working (not
> experimental) tools. And you can document that your library is only
> compatible with gcc-12+.

To be fair, I partially understand where Thiago may be coming from.
Doing that is *horrible* for things like.. say.. Qt.
While it's not nice to opt in to new features for seventeen different
implementations, it's not much fun to disable
features due to bugs in seventeen different implementations either.
(Yeah, seventeen is a bit of an exaggeration.)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:18             ` Ville Voutilainen
@ 2021-02-27  0:36               ` Thiago Macieira
  2021-02-27  0:44                 ` Ville Voutilainen
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27  0:36 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Friday, 26 February 2021 16:18:33 PST Ville Voutilainen wrote:
> There's nothing new in that either. I do, however, wonder how
> dangerous using the current
> implementation actually is? Anyhoo, almost regardless of that, we have
> had situations like
> that before. While we don't exactly want to repeat them (std::regex
> in.. gcc8? comes to mind here),
> I'm not entirely convinced that we are talking about a showstopper problem
> here.

The danger is not how they're implemented today. The danger is the change. 
That would imply a pre-change wait() is never woken up by a post-change 
notify() or vice-versa. That in turn causes deadlocks in latches, barriers and 
semaphores.

Depending on how the two implementations are achieved, the outcome may also 
depend on the order of waits and notifies and which side of the change is 
doing which operation. That in turn leads to heisenbugs in production which a 
developer will almost never see.

> > Fair enough, but we know we are not in the sunshine case. We know now,
> > before the release, that it WILL change and cause silent
> > incompatibilities. Does that change the conclusion?
> 
> Do tell me what those silent incompatibilities really are. If they're
> differences in inlined
> code baked into client applications, I'm not sure I see what's so
> earth-shattering there.

1) size of structures change
2) default template parameters change (silent ODR violations)
3) data protocol changes (whether certain static members are used or not)

> > How about the other way around: only enable the new, experimental code if
> > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
> 
> That seems drastic. We wouldn't get pretty much any use coverage of
> any new code,
> and that would just sweep the rug from under the whole idea of
> feature-testing macros,
> half of which is allowing bold users to adopt new facilities early.

Those goals seem incompatible. If the feature is experimental and not to be 
used in production, then we don't want users to adopt the facilities early. We 
want them to wait until we declare that it's no longer experimental.

> Now they would need
> to fall back to seventeen different implementation-specific ways to
> opt in to that adoption,
> which takes us back a decade.

I'm the last person to want those multiple ways of detecting when a feature is 
safe to be used. You know that. We specifically made it so Qt would NOT use 
any new C++14 or C++17 feature from the compiler or the Standard Library until 
the macros were present, thus helping force a certain compiler vendor to adopt 
the feature macros.

But this is exactly the issue: I *don't* want to have to know that 
__cpp_lib_atomic_wait == 201907L is only acceptable if __GNUC_MAJOR__ > 11.

> Please explain how your concerns about atomic waits
> are *anywhere near* the cost of doing that?

Which part? The part about why I think the implementation should be different? 
Or the part in which making a change causes future problems?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:22             ` Marc Glisse
  2021-02-27  0:30               ` Ville Voutilainen
@ 2021-02-27  0:43               ` Thiago Macieira
  1 sibling, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27  0:43 UTC (permalink / raw)
  To: libstdc++; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++

On Friday, 26 February 2021 16:22:34 PST Marc Glisse wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:
> > How about the other way around: only enable the new, experimental code if
> > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
> 
> That's exactly what the user did, they passed -std=c++20...
> 
> Ok, it might help if the option was called -std=c++2a until it is stable,
> or if we defined a macro that advertises if we are in an experimental mode
> or not.

I wouldn't mind that. If GCC documents that it's "c++20" or "c++23" when it's 
stable and no incompatibilities are expected (not counting exceptional 
circumstances), I would gladly accept it. Do note that it would go against 
Ville's email that you replied to that said we want to get early exposure and 
coverage. It means a single, obscure Standard Library feature could hold back 
the entire flag for the Core Language or vice-versa.

I don't think that is the best outcome, though. I would prefer it more fine-
grained. And yeah, in the sunshine case where everything works, it might come 
down to "oops, we made a mistake" (like <filesystem> file time clock).

This email thread is to advise we will have such a case. Are there steps we 
can take to avoid it?

> But ultimately, it is the user's responsibility to use working (not
> experimental) tools. And you can document that your library is only
> compatible with gcc-12+.

Give me a programmatic way and I'll be happy.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:36               ` Thiago Macieira
@ 2021-02-27  0:44                 ` Ville Voutilainen
  2021-02-27  0:53                   ` Thiago Macieira
  2021-03-03 14:30                   ` Jonathan Wakely
  0 siblings, 2 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27  0:44 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com> wrote:
>
> On Friday, 26 February 2021 16:18:33 PST Ville Voutilainen wrote:
> > There's nothing new in that either. I do, however, wonder how
> > dangerous using the current
> > implementation actually is? Anyhoo, almost regardless of that, we have
> > had situations like
> > that before. While we don't exactly want to repeat them (std::regex
> > in.. gcc8? comes to mind here),
> > I'm not entirely convinced that we are talking about a showstopper problem
> > here.
>
> The danger is not how they're implemented today. The danger is the change.
> That would imply a pre-change wait() is never woken up by a post-change
> notify() or vice-versa. That in turn causes deadlocks in latches, barriers and
> semaphores.

Then we must be talking about something else than the topic of this
thread, which says "inefficient". :P
I need to re-read your patches, but not until tomorrow.

> > Do tell me what those silent incompatibilities really are. If they're
> > differences in inlined
> > code baked into client applications, I'm not sure I see what's so
> > earth-shattering there.
>
> 1) size of structures change
> 2) default template parameters change (silent ODR violations)
> 3) data protocol changes (whether certain static members are used or not)

In other words, ABI breaks, sure.

> > > How about the other way around: only enable the new, experimental code if
> > > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
> >
> > That seems drastic. We wouldn't get pretty much any use coverage of
> > any new code,
> > and that would just sweep the rug from under the whole idea of
> > feature-testing macros,
> > half of which is allowing bold users to adopt new facilities early.
>
> Those goals seem incompatible. If the feature is experimental and not to be
> used in production, then we don't want users to adopt the facilities early. We
> want them to wait until we declare that it's no longer experimental.

Well, we want *bold* users to adopt new facilities early. Not all users.
Feature-testing macros don't cover the problem of innocent users accidentally
opting in to something that's in flux; the general expectation is that innocent
users shouldn't do that.

> But this is exactly the issue: I *don't* want to have to know that
> __cpp_lib_atomic_wait == 201907L is only acceptable if __GNUC_MAJOR__ > 11.

Aye. I get that perfectly well.

> > Please explain how your concerns about atomic waits
> > are *anywhere near* the cost of doing that?
> Which part? The part about why I think the implementation should be different?
> Or the part in which making a change causes future problems?

The part where there is an ABI break that's insurmountable. But as I
said, I need to read
your patches with fresh eyes, not with past-midnight eyes.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:44                 ` Ville Voutilainen
@ 2021-02-27  0:53                   ` Thiago Macieira
  2021-02-27  1:03                     ` Ville Voutilainen
  2021-03-03 14:30                   ` Jonathan Wakely
  1 sibling, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27  0:53 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Friday, 26 February 2021 16:44:44 PST Ville Voutilainen wrote:
> On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com> 
wrote:
> > The danger is not how they're implemented today. The danger is the change.
> > That would imply a pre-change wait() is never woken up by a post-change
> > notify() or vice-versa. That in turn causes deadlocks in latches, barriers
> > and semaphores.
> 
> Then we must be talking about something else than the topic of this
> thread, which says "inefficient". :P

Well, if they were efficient in the first place, we wouldn't need a change. :)

The problem is that they ARE inefficient and they're implemented in such a way 
that a change to make them efficient breaks compatibility with the pre-change 
implementation.

> I need to re-read your patches, but not until tomorrow.

Fair enough. I'm disconnecting from the work VPN right now and won't be 
getting replies. Let me know if I can explain anything, though.

> Well, we want *bold* users to adopt new facilities early. Not all users.
> Feature-testing macros don't cover the problem of innocent users
> accidentally opting in to something that's in flux; the general expectation
> is that innocent users shouldn't do that.

I totally agree.

But how do we allow bold users to adopt the facilities in such a way not to 
endanger the other (italic?) users?

> > > Please explain how your concerns about atomic waits
> > > are *anywhere near* the cost of doing that?
> > 
> > Which part? The part about why I think the implementation should be
> > different? Or the part in which making a change causes future problems?
> 
> The part where there is an ABI break that's insurmountable. But as I
> said, I need to read
> your patches with fresh eyes, not with past-midnight eyes.

Hyvää viikonloppua, Ville.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:53                   ` Thiago Macieira
@ 2021-02-27  1:03                     ` Ville Voutilainen
  0 siblings, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27  1:03 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++

On Sat, 27 Feb 2021 at 02:53, Thiago Macieira <thiago.macieira@intel.com> wrote:
> Hyvää viikonloppua, Ville.

Tenha um bom fim de semana, meu amigo. :)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
  2021-02-26 11:19 ` Jonathan Wakely
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
@ 2021-02-27  1:13 ` Thomas Rodgers
  2021-02-27  1:29   ` Thomas Rodgers
  2021-02-27  2:02   ` Ville Voutilainen
  2 siblings, 2 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27  1:13 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, trodgers, jwakely

We are SPECIFICALLY NOT committing to an ABI for wait/notify with GCC11. 
These features are experimental and I have no intention of treating this 
as a blocker.

On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:

> Hello all
> 
> I was investigating the implementation of std::atomic<T>::wait to use 
> inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline 
> and,
> once released, it will tie our hands until the next ABI break 
> (libstdc++.so.
> 7). And no, inline namespaces and ABI tags are no different than ABI 
> breaks,
> they only make the ABI break more or less silent in the process.
> 
> Here's a summary of the findings:
> 
> 1) everything is inline
> 2) futex code is still behind a lot of code calling into 
> std::_Hash_bytes
> 3) other int-sized (incl. enums) atomics don't use futex
> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't 
> handle
> 
> Here's what the code currently looks like: 
> https://gcc.godbolt.org/z/MeP8rr
> 
> Given how far we are in the release cycle, a simple solution is to 
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality 
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
> 
> I have patches for all but the first issue, but they are a result of 
> only 3
> hours of hacking to prove the concept. I will post them, but I do not 
> expect
> them to get merged as-is.
> 
> Details:
> 
> 1) everything is inline
> 
> I think this is unnecessary optimisation, especially since it expands 
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL 
> shows that
> the actual wait and notification are out-of-line. I recommend we do the 
> same
> for libstdc++: if we're going to syscall anyway, the extra function 
> call isn't
> going to be the performance bottleneck.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, 
> Jonathan
> wrote:
> 
>> Eventually we'll want to move the rest of this function (which doesn't
>> depend on the template argument) into the compiled library, but it's
>> better to be header-only for now.
> 
> But that has never happened.
> 
> Having out-of-line implementations would allow the implementation for 
> generic-
> sized atomic wait/notify to fine-tune as time and experience 
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear 
> if it is
> the best for all workloads. Moreover, the implementation is limited to 
> 16
> waiters.
> 
> Moreover, it would allow us to have support in the future for a 64-bit 
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has 
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have 
> support
> for 64-bit compare-and-wait functionality.
> 
> And it would allow us to have futex-equivalent support in other OSes, 
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current 
> state.
> It's also possible other OSes will eventually add equivalent 
> functionality
> because of the C++ standard. For example, FreeBSD already has futex() 
> support
> inside its kernel in order to run Linux binaries, but it is not exposed 
> to the
> FreeBSD-personality syscall ABI.
> 
> Inlining certain functionality at a later date is possible. De-inlining 
> isn't.
> 
> 2) futex code for int is still behind a lot of boiler plate
> 
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect 
> whether
> there is contention in the first place and avoid the system call, but I 
> think
> that's premature optimisation.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, 
> Thomas
> says:
> 
>> __atomic_notify checks to see if that count is non-zero
>> before invoking the platform/semaphore notify because it is cheaper
>> to do the atomic load than it is to make the syscall() when there are 
>> no
>> waiters.
> 
> That's true, but it ignores the fact that users of 
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I 
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation 
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
> 
> [1] https://github.com/ogiroux/atomic_wait
> 
> 3) other int-sized atomics don't use futex
> 
> The kernel doesn't care what we store in those 32 bits, only that they 
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations 
> that
> are signed, but we don't use that and the kernel API can be later 
> extended to
> unsigned compares). Therefore, I would recommend we extend futex 
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
> 
> 4) std::latch and std::counting_semaphore defaults too high
> 
> The standard requires a ptrdiff_t in the API, but it does allow us to 
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's 
> default
> limit uses PTRDIFF_MAX:
> 
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
> 
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend 
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
> 
> 5) std::barrier uses a type that futex can't handle
> 
> The barrier phase is the actual type that gets waited / notified on, 
> but it is
> an enum class on unsigned char:
> 
> enum class __barrier_phase_t : unsigned char { };
> 
> This is a harder problem to solve, but I really do recommend making it 
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full 
> 32
> bits, I don't know (I've implemented with wrapping on 256).
> 
> What is of consequence is that the __state_t type could increase in 
> size
> ("could" because we can simply reduce the number of tickets in each 
> state
> entry). I haven't done enough research on this to know whether it is 
> the right
> way to go.



Links:
------
[1] https://github.com/ogiroux/atomic_wait

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
@ 2021-02-27  1:29   ` Thomas Rodgers
  2021-02-27  3:01     ` Thomas Rodgers
  2021-02-27  2:02   ` Ville Voutilainen
  1 sibling, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27  1:29 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++

On 2021-02-26 17:13, Thomas Rodgers wrote:

> We are SPECIFICALLY NOT committing to an ABI for wait/notify with 
> GCC11. These features are experimental and I have no intention of 
> treating this as a blocker.
> 
> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
> 
>> Hello all
>> 
>> I was investigating the implementation of std::atomic<T>::wait to use 
>> inside
>> my own code (Qt's QMutex and QSemaphore), when I realised that it has
>> huge inefficiencies. Since everything in this implementation is inline 
>> and,
>> once released, it will tie our hands until the next ABI break 
>> (libstdc++.so.
>> 7). And no, inline namespaces and ABI tags are no different than ABI 
>> breaks,
>> they only make the ABI break more or less silent in the process.
>> 
>> Here's a summary of the findings:
>> 
>> 1) everything is inline
>> 2) futex code is still behind a lot of code calling into 
>> std::_Hash_bytes

It's likely possible to remove the dependence on _Hash_bytes. I looked 
at it while working on the most recent version of this code but decided 
to not touch it for GCC11 because...it's experimental and we
aren't committing to an ABI.

> 3) other int-sized (incl. enums) atomics don't use futex

The most recent version of this code extends the range of types which 
use futex to all 32bit types.

> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't 
> handle
> 
> Here's what the code currently looks like: 
> https://gcc.godbolt.org/z/MeP8rr
> 
> Given how far we are in the release cycle, a simple solution is to 
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality 
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
> 
> I have patches for all but the first issue, but they are a result of 
> only 3
> hours of hacking to prove the concept. I will post them, but I do not 
> expect
> them to get merged as-is.
> 
> Details:
> 
> 1) everything is inline
> 
> I think this is unnecessary optimisation, especially since it expands 
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL 
> shows that
> the actual wait and notification are out-of-line. I recommend we do the 
> same
> for libstdc++: if we're going to syscall anyway, the extra function 
> call isn't
> going to be the performance bottleneck.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, 
> Jonathan
> wrote:
> 
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.
> But that has never happened.
> 
> Having out-of-line implementations would allow the implementation for 
> generic-
> sized atomic wait/notify to fine-tune as time and experience 
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear 
> if it is
> the best for all workloads. Moreover, the implementation is limited to 
> 16
> waiters.
> 
> Moreover, it would allow us to have support in the future for a 64-bit 
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has 
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have 
> support
> for 64-bit compare-and-wait functionality.
> 
> And it would allow us to have futex-equivalent support in other OSes, 
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current 
> state.
> It's also possible other OSes will eventually add equivalent 
> functionality
> because of the C++ standard. For example, FreeBSD already has futex() 
> support
> inside its kernel in order to run Linux binaries, but it is not exposed 
> to the
> FreeBSD-personality syscall ABI.
> 
> Inlining certain functionality at a later date is possible. De-inlining 
> isn't.
> 
> 2) futex code for int is still behind a lot of boiler plate
> 
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect 
> whether
> there is contention in the first place and avoid the system call, but I 
> think
> that's premature optimisation.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, 
> Thomas
> says:
> 
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are 
> no
> waiters.
> That's true, but it ignores the fact that users of 
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I 
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation 
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
> 
> [1] https://github.com/ogiroux/atomic_wait
> 
> 3) other int-sized atomics don't use futex
> 
> The kernel doesn't care what we store in those 32 bits, only that they 
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations 
> that
> are signed, but we don't use that and the kernel API can be later 
> extended to
> unsigned compares). Therefore, I would recommend we extend futex 
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
> 
> 4) std::latch and std::counting_semaphore defaults too high
> 
> The standard requires a ptrdiff_t in the API, but it does allow us to 
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's 
> default
> limit uses PTRDIFF_MAX:
> 
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
> 
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend 
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
> 
> 5) std::barrier uses a type that futex can't handle
> 
> The barrier phase is the actual type that gets waited / notified on, 
> but it is
> an enum class on unsigned char:
> 
> enum class __barrier_phase_t : unsigned char { };
> 
> This is a harder problem to solve, but I really do recommend making it 
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full 
> 32
> bits, I don't know (I've implemented with wrapping on 256).
> 
> What is of consequence is that the __state_t type could increase in 
> size
> ("could" because we can simply reduce the number of tickets in each 
> state
> entry). I haven't done enough research on this to know whether it is 
> the right
> way to go.

Links:
------
[1] https://github.com/ogiroux/atomic_wait

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
  2021-02-27  1:29   ` Thomas Rodgers
@ 2021-02-27  2:02   ` Ville Voutilainen
  1 sibling, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27  2:02 UTC (permalink / raw)
  To: Thomas Rodgers
  Cc: Thiago Macieira, Thomas Rodgers, Jonathan Wakely, libstdc++

On Sat, 27 Feb 2021 at 03:53, Thomas Rodgers <rodgert@appliantology.com> wrote:
>
> We are SPECIFICALLY NOT committing to an ABI for wait/notify with GCC11.
> These features are experimental and I have no intention of treating this
> as a blocker.

Oh, treating it as a blocker would be madness. But Thiago has a point
- if you have GCC11 code
and GCC13 code, with the same feature-macros and everything compiles
the same, and there's
an ABI break between those, that's not entirely a non-issue. I'm *NOT*
suggesting that that's
a new problem; we did that with std::variant, although I don't exactly
recall whether it was actually
released before I changed the size of its index type depending on the
number of alternatives it can
have. I'm merely saying that it's not a non-issue. I'm yet to
determine how significant this is.
Let's just calmly look at it and ponder.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  1:29   ` Thomas Rodgers
@ 2021-02-27  3:01     ` Thomas Rodgers
  2021-03-01 17:46       ` Thomas Rodgers
  0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27  3:01 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++

On 2021-02-26 17:29, Thomas Rodgers wrote:

> On 2021-02-26 17:13, Thomas Rodgers wrote:
> 
> We are SPECIFICALLY NOT committing to an ABI for wait/notify with 
> GCC11. These features are experimental and I have no intention of 
> treating this as a blocker.
> 
> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
> 
> Hello all
> 
> I was investigating the implementation of std::atomic<T>::wait to use 
> inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline 
> and,
> once released, it will tie our hands until the next ABI break 
> (libstdc++.so.
> 7). And no, inline namespaces and ABI tags are no different than ABI 
> breaks,
> they only make the ABI break more or less silent in the process.
> 
> Here's a summary of the findings:
> 
> 1) everything is inline
> 2) futex code is still behind a lot of code calling into 
> std::_Hash_bytes

It's likely possible to remove the dependence on _Hash_bytes. I looked 
at it while working on the most recent version of this code but decided 
to not touch it for GCC11 because...it's experimental and we
aren't committing to an ABI.

> 3) other int-sized (incl. enums) atomics don't use futex

The most recent version of this code extends the range of types which 
use futex to all 32bit types.

> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't 
> handle
> 
> Here's what the code currently looks like: 
> https://gcc.godbolt.org/z/MeP8rr
> 
> Given how far we are in the release cycle, a simple solution is to 
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality 
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
> 
> I have patches for all but the first issue, but they are a result of 
> only 3
> hours of hacking to prove the concept. I will post them, but I do not 
> expect
> them to get merged as-is.
> 
> Details:
> 
> 1) everything is inline
> 
> I think this is unnecessary optimisation, especially since it expands 
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL 
> shows that
> the actual wait and notification are out-of-line. I recommend we do the 
> same
> for libstdc++: if we're going to syscall anyway, the extra function 
> call isn't
> going to be the performance bottleneck.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, 
> Jonathan
> wrote:
> 
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.
> But that has never happened.
> 
> Having out-of-line implementations would allow the implementation for 
> generic-
> sized atomic wait/notify to fine-tune as time and experience 
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear 
> if it is
> the best for all workloads. Moreover, the implementation is limited to 
> 16
> waiters.
> 
> Moreover, it would allow us to have support in the future for a 64-bit 
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has 
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have 
> support
> for 64-bit compare-and-wait functionality.
> 
> And it would allow us to have futex-equivalent support in other OSes, 
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current 
> state.
> It's also possible other OSes will eventually add equivalent 
> functionality
> because of the C++ standard. For example, FreeBSD already has futex() 
> support
> inside its kernel in order to run Linux binaries, but it is not exposed 
> to the
> FreeBSD-personality syscall ABI.
> 
> Inlining certain functionality at a later date is possible. De-inlining 
> isn't.
> 
> 2) futex code for int is still behind a lot of boiler plate
> 
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect 
> whether
> there is contention in the first place and avoid the system call, but I 
> think
> that's premature optimisation.
> 
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, 
> Thomas
> says:
> 
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are 
> no
> waiters.
> That's true, but it ignores the fact that users of 
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I 
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation 
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
> 
> [1] https://github.com/ogiroux/atomic_wait
> 

THe final version that Olivier committed to libc++ does track 
contention.

> 3) other int-sized atomics don't use futex
> 
> The kernel doesn't care what we store in those 32 bits, only that they 
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations 
> that
> are signed, but we don't use that and the kernel API can be later 
> extended to
> unsigned compares). Therefore, I would recommend we extend futex 
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
> 
> 4) std::latch and std::counting_semaphore defaults too high
> 
> The standard requires a ptrdiff_t in the API, but it does allow us to 
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's 
> default
> limit uses PTRDIFF_MAX:
> 
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
> 
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend 
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
> 
> 5) std::barrier uses a type that futex can't handle
> 
> The barrier phase is the actual type that gets waited / notified on, 
> but it is
> an enum class on unsigned char:
> 
> enum class __barrier_phase_t : unsigned char { };
> 
> This is a harder problem to solve, but I really do recommend making it 
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full 
> 32
> bits, I don't know (I've implemented with wrapping on 256).
> 
> What is of consequence is that the __state_t type could increase in 
> size
> ("could" because we can simply reduce the number of tickets in each 
> state
> entry). I haven't done enough research on this to know whether it is 
> the right
> way to go.

My understanding from Olivier Giroux, who's barrier implementation code 
this is derived from, is that he extensively tested it's performance on 
a wide variety of hardware, including time on Summit. Short of hard data 
to the contrary, I am not inclined to second guess the implementation he 
tested (which uses char tickets).

Links:
------
[1] https://github.com/ogiroux/atomic_wait

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-02-26 15:59   ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
@ 2021-02-28 15:05     ` Hans-Peter Nilsson
  2021-03-01 16:28       ` Thomas Rodgers
  2021-03-01 17:24       ` Thiago Macieira
  0 siblings, 2 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-02-28 15:05 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> ints can be used in futexes. chars can't.

Shouldn't that be an atomic type instead of a bare int then?

> ---
>  libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
> index e09212dfcb9..ae058bd3dc3 100644
> --- a/libstdc++-v3/include/std/barrier
> +++ b/libstdc++-v3/include/std/barrier
> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for two main reasons:
>
>  */
>
> -  enum class __barrier_phase_t : unsigned char { };
> +  enum class __barrier_phase_t : int { };

brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-27  0:13         ` Thiago Macieira
@ 2021-02-28 21:31           ` Hans-Peter Nilsson
  2021-03-01  8:56             ` Richard Biener
  2021-03-01 16:32             ` Thomas Rodgers
  0 siblings, 2 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-02-28 21:31 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Andreas Schwab, libstdc++, gcc-patches



On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:

> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> > On Feb 26 2021, Thiago Macieira wrote:
> > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > >> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > >> > +    alignas(__alignof__(int)) int _M_a;
> > >>
> > >> Futexes must be aligned to 4 bytes.
> > >
> > > Agreed, but doesn't this accomplish that?
> >
> > No.  It uses whatever alignment the type already has, and is an
> > elaborate no-op.
>
> I thought so too when I read the original line. But I expected it was written
> like that for a reason, especially since the same pattern appears in other
> places.
>
> I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
> the correct solution?

IMNSHO make use of the corresponding atomic type.  Then there'd
be no need for separate what's-the-right-align-curse games.

brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-28 21:31           ` Hans-Peter Nilsson
@ 2021-03-01  8:56             ` Richard Biener
  2021-03-03 14:56               ` Jonathan Wakely
  2021-03-01 16:32             ` Thomas Rodgers
  1 sibling, 1 reply; 73+ messages in thread
From: Richard Biener @ 2021-03-01  8:56 UTC (permalink / raw)
  To: Hans-Peter Nilsson
  Cc: Thiago Macieira, GCC Patches, libstdc++, Andreas Schwab

On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>
>
>
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>
> > On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> > > On Feb 26 2021, Thiago Macieira wrote:
> > > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> > > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > > >> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > > >> > +    alignas(__alignof__(int)) int _M_a;
> > > >>
> > > >> Futexes must be aligned to 4 bytes.
> > > >
> > > > Agreed, but doesn't this accomplish that?
> > >
> > > No.  It uses whatever alignment the type already has, and is an
> > > elaborate no-op.
> >
> > I thought so too when I read the original line. But I expected it was written
> > like that for a reason, especially since the same pattern appears in other
> > places.
> >
> > I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
> > the correct solution?
>
> IMNSHO make use of the corresponding atomic type.  Then there'd
> be no need for separate what's-the-right-align-curse games.

Or align as the corresponding atomic type (in case using an actual
std::atomic<int> is undesirable).  OTOH the proposed code isn't
any more bogus than the previous ...

Richard.

> brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-02-28 15:05     ` Hans-Peter Nilsson
@ 2021-03-01 16:28       ` Thomas Rodgers
  2021-03-01 17:24       ` Thiago Macieira
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 16:28 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: libstdc++, gcc-patches

On 2021-02-28 07:05, Hans-Peter Nilsson wrote:

> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> 
>> ints can be used in futexes. chars can't.
> 
> Shouldn't that be an atomic type instead of a bare int then?
> 

It's an atomic_ref.

>> ---
>> libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
>> 1 file changed, 12 insertions(+), 9 deletions(-)
>> 
>> diff --git a/libstdc++-v3/include/std/barrier 
>> b/libstdc++-v3/include/std/barrier
>> index e09212dfcb9..ae058bd3dc3 100644
>> --- a/libstdc++-v3/include/std/barrier
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for 
>> two main reasons:
>> 
>> */
>> 
>> -  enum class __barrier_phase_t : unsigned char { };
>> +  enum class __barrier_phase_t : int { };
> 
> brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-28 21:31           ` Hans-Peter Nilsson
  2021-03-01  8:56             ` Richard Biener
@ 2021-03-01 16:32             ` Thomas Rodgers
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 16:32 UTC (permalink / raw)
  To: Hans-Peter Nilsson
  Cc: Thiago Macieira, gcc-patches, libstdc++, Andreas Schwab

On 2021-02-28 13:31, Hans-Peter Nilsson wrote:

>> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>> 
>> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote: On Feb 
>> 26 2021, Thiago Macieira wrote: On Friday, 26 February 2021 10:14:42 
>> PST Andreas Schwab wrote: On Feb 26 2021, Thiago Macieira via 
>> Gcc-patches wrote: -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t 
>> _M_a;
>> +    alignas(__alignof__(int)) int _M_a;
>> Futexes must be aligned to 4 bytes.
>> 
>> Agreed, but doesn't this accomplish that?
> No.  It uses whatever alignment the type already has, and is an
> elaborate no-op.
> I thought so too when I read the original line. But I expected it was 
> written
> like that for a reason, especially since the same pattern appears in 
> other
> places.

> I can change to "alignas(4)" (which is a GCC extension, I believe). Is 
> that
> the correct solution?
> IMNSHO make use of the corresponding atomic type.  Then there'd
> be no need for separate what's-the-right-align-curse games.

There is no predicate wait on atomic<T>.

> brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-02-28 15:05     ` Hans-Peter Nilsson
  2021-03-01 16:28       ` Thomas Rodgers
@ 2021-03-01 17:24       ` Thiago Macieira
  2021-03-01 17:38         ` Thomas Rodgers
  2021-03-01 18:12         ` Ville Voutilainen
  1 sibling, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 17:24 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: libstdc++, gcc-patches

On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > ints can be used in futexes. chars can't.
> 
> Shouldn't that be an atomic type instead of a bare int then?

There are a couple of atomic_refs:

      using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
      using __atomic_phase_const_ref_t = std::__atomic_ref<const 
__barrier_phase_t>;

And _M_phase, despite being non-atomic, is never accessed without the 
atomic_ref, aside from the constructor. Both arrive() and wait() start off by 
creating the atomic_ref.

But I confess I don't understand this code sufficiently to say it is correct. 
I'm simply saying that waiting on unsigned chars will not use a futex, at 
least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 17:24       ` Thiago Macieira
@ 2021-03-01 17:38         ` Thomas Rodgers
  2021-03-01 17:40           ` Thomas Rodgers
  2021-03-01 18:06           ` Thiago Macieira
  2021-03-01 18:12         ` Ville Voutilainen
  1 sibling, 2 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:38 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches

On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:

> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On 
> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be 
> used in futexes. chars can't.
> Shouldn't that be an atomic type instead of a bare int then?

> There are a couple of atomic_refs:
> 
>      using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>      using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
> 
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start 
> off by
> creating the atomic_ref.

If it's non-atomic, then how is wait() supposed to wait on it, 
atomically?

> But I confess I don't understand this code sufficiently to say it is 
> correct.
> I'm simply saying that waiting on unsigned chars will not use a futex, 
> at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the 
> kernel.

And I am not disagreeing with that. I am, however saying, that I know 
this particular implementation (well the upstream one it is based on) 
has been extensively tested by the original author (ogiroux) including 
time on Summit. If we are going to start changing his design decisions 
(beyond the largely cosmetic, not algorithmic, ones that I have made as 
per Jonathan's request), they should be motivated by more than a 'well 
we feel int's would be better here because Futex' justification.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 17:38         ` Thomas Rodgers
@ 2021-03-01 17:40           ` Thomas Rodgers
  2021-03-01 18:06           ` Thiago Macieira
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:40 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Hans-Peter Nilsson

Botched replying to the list, sending again

On 2021-03-01 09:38, Thomas Rodgers wrote:

> On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:
> 
>> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On 
>> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be 
>> used in futexes. chars can't.
>> Shouldn't that be an atomic type instead of a bare int then?
> 
>> There are a couple of atomic_refs:
>> 
>> using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>> using __atomic_phase_const_ref_t = std::__atomic_ref<const
>> __barrier_phase_t>;
>> 
>> And _M_phase, despite being non-atomic, is never accessed without the
>> atomic_ref, aside from the constructor. Both arrive() and wait() start 
>> off by
>> creating the atomic_ref.
> 
> If it's non-atomic, then how is wait() supposed to wait on it, 
> atomically?
> 
>> But I confess I don't understand this code sufficiently to say it is 
>> correct.
>> I'm simply saying that waiting on unsigned chars will not use a futex, 
>> at
>> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the 
>> kernel.
> 
> And I am not disagreeing with that. I am, however saying, that I know 
> this particular implementation (well the upstream one it is based on) 
> has been extensively tested by the original author (ogiroux) including 
> time on Summit. If we are going to start changing his design decisions 
> (beyond the largely cosmetic, not algorithmic, ones that I have made as 
> per Jonathan's request), they should be motivated by more than a 'well 
> we feel int's would be better here because Futex' justification.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  3:01     ` Thomas Rodgers
@ 2021-03-01 17:46       ` Thomas Rodgers
  2021-03-01 18:00         ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:46 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++

I discussed the prematureness of the contention optimization with 
ogiroux this morning. Confirming, libc++ does do this, and in his 
measurements it saves on the order of 50us.

Regarding the use of _Hash_impl::hash(), that can be changed to just use 
the pointer value itself. This is what libatomic does. I'm happy to 
propose that as a subsequent patch to the most recent version of this 
code which I submitted recently.

On 2021-02-26 19:01, Thomas Rodgers wrote:

> On 2021-02-26 17:29, Thomas Rodgers wrote:
> 
>> On 2021-02-26 17:13, Thomas Rodgers wrote:
>> 
>> We are SPECIFICALLY NOT committing to an ABI for wait/notify with 
>> GCC11. These features are experimental and I have no intention of 
>> treating this as a blocker.
>> 
>> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
>> 
>> Hello all
>> 
>> I was investigating the implementation of std::atomic<T>::wait to use 
>> inside
>> my own code (Qt's QMutex and QSemaphore), when I realised that it has
>> huge inefficiencies. Since everything in this implementation is inline 
>> and,
>> once released, it will tie our hands until the next ABI break 
>> (libstdc++.so.
>> 7). And no, inline namespaces and ABI tags are no different than ABI 
>> breaks,
>> they only make the ABI break more or less silent in the process.
>> 
>> Here's a summary of the findings:
>> 
>> 1) everything is inline
>> 2) futex code is still behind a lot of code calling into 
>> std::_Hash_bytes
> 
> It's likely possible to remove the dependence on _Hash_bytes. I looked 
> at it while working on the most recent version of this code but decided 
> to not touch it for GCC11 because...it's experimental and we
> aren't committing to an ABI.
> 
>> 3) other int-sized (incl. enums) atomics don't use futex
> 
> The most recent version of this code extends the range of types which 
> use futex to all 32bit types.
> 
>> 4) std::latch and std::counting_semaphore defaults preclude from using
>> futex on Linux
>> 5) std::barrier implementation also uses a type that futex(2) can't 
>> handle
>> 
>> Here's what the code currently looks like: 
>> https://gcc.godbolt.org/z/MeP8rr
>> 
>> Given how far we are in the release cycle, a simple solution is to 
>> simply
>> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
>> __cpp_lib_atomic_wait isn't defined, the rest of the functionality 
>> cascades
>> disabled too. Then we can spend however long is necessary to get a 
>> good
>> implementation for GCC 12.
>> 
>> I have patches for all but the first issue, but they are a result of 
>> only 3
>> hours of hacking to prove the concept. I will post them, but I do not 
>> expect
>> them to get merged as-is.
>> 
>> Details:
>> 
>> 1) everything is inline
>> 
>> I think this is unnecessary optimisation, especially since it expands 
>> to a LOT
>> of code (see link above). A cursory research into libc++ and MS STL 
>> shows that
>> the actual wait and notification are out-of-line. I recommend we do 
>> the same
>> for libstdc++: if we're going to syscall anyway, the extra function 
>> call isn't
>> going to be the performance bottleneck.
>> 
>> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, 
>> Jonathan
>> wrote:
>> 
>> Eventually we'll want to move the rest of this function (which doesn't
>> depend on the template argument) into the compiled library, but it's
>> better to be header-only for now.
>> But that has never happened.
>> 
>> Having out-of-line implementations would allow the implementation for 
>> generic-
>> sized atomic wait/notify to fine-tune as time and experience 
>> progresses. The
>> std::__detail::__waiter implementation looks clever, but it's unclear 
>> if it is
>> the best for all workloads. Moreover, the implementation is limited to 
>> 16
>> waiters.
>> 
>> Moreover, it would allow us to have support in the future for a 64-bit 
>> futex.
>> Linus did shoot down any need for it 13 years ago, but the world has 
>> changed
>> since and it's not inconceivable to have a pointer-sized futex on 
>> every
>> platform, which on 64-bit would be 64-bit. Windows and Darwin do have 
>> support
>> for 64-bit compare-and-wait functionality.
>> 
>> And it would allow us to have futex-equivalent support in other OSes, 
>> added at
>> our leisure. The current implementation does not support Windows's
>> WaitOnAddress and could never support it if released in the current 
>> state.
>> It's also possible other OSes will eventually add equivalent 
>> functionality
>> because of the C++ standard. For example, FreeBSD already has futex() 
>> support
>> inside its kernel in order to run Linux binaries, but it is not 
>> exposed to the
>> FreeBSD-personality syscall ABI.
>> 
>> Inlining certain functionality at a later date is possible. 
>> De-inlining isn't.
>> 
>> 2) futex code for int is still behind a lot of boiler plate
>> 
>> It comes from the creation/use of std::__detail::__waiter in
>> std::__atomic_wait and __atomic_notify. That does allow one to detect 
>> whether
>> there is contention in the first place and avoid the system call, but 
>> I think
>> that's premature optimisation.
>> 
>> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, 
>> Thomas
>> says:
>> 
>> __atomic_notify checks to see if that count is non-zero
>> before invoking the platform/semaphore notify because it is cheaper
>> to do the atomic load than it is to make the syscall() when there are 
>> no
>> waiters.
>> That's true, but it ignores the fact that users of 
>> std::atomic<T>::wait() are
>> likely to track contention by themselves on the value of the atomic (I 
>> know I
>> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation 
>> operate on
>> direct system calls and don't try to track contention. Because of 
>> that,
>> libstdc++ adding an extra layer of tracking is double book-keeping and
>> violates "don't pay for what you don't need".
>> 
>> [1] https://github.com/ogiroux/atomic_wait
> 
> THe final version that Olivier committed to libc++ does track 
> contention.
> 
>> 3) other int-sized atomics don't use futex
>> 
>> The kernel doesn't care what we store in those 32 bits, only that they 
>> are
>> comparable (FUTEX_OP does provide less-than and greater-than 
>> operations that
>> are signed, but we don't use that and the kernel API can be later 
>> extended to
>> unsigned compares). Therefore, I would recommend we extend futex 
>> support to
>> all 32-bit signed scalar types. This would add:
>> * pointers and long on 32-bit architectures
>> * unsigned
>> * untyped enums and typed enums on int & unsigned int
>> * float
>> 
>> 4) std::latch and std::counting_semaphore defaults too high
>> 
>> The standard requires a ptrdiff_t in the API, but it does allow us to 
>> set hard
>> limits and/or defaults for best performance. Right now, std::latch is
>> implemented using ptrdiff_t internally and std::counting_semaphore's 
>> default
>> limit uses PTRDIFF_MAX:
>> 
>> template<ptrdiff_t __least_max_value =
>> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
>> class counting_semaphore
>> 
>> As a consequence, those two classes will use 64-bit atomics on 64-bit
>> architectures, which means they will not use futex. I recommend 
>> changing
>> std::latch's internal to int and lowering the default maximum in
>> std::counting_semaphore to INT_MAX.
>> 
>> 5) std::barrier uses a type that futex can't handle
>> 
>> The barrier phase is the actual type that gets waited / notified on, 
>> but it is
>> an enum class on unsigned char:
>> 
>> enum class __barrier_phase_t : unsigned char { };
>> 
>> This is a harder problem to solve, but I really do recommend making it 
>> 32-bit
>> in width. Whether each barrier phase holds only 256 values or the full 
>> 32
>> bits, I don't know (I've implemented with wrapping on 256).
>> 
>> What is of consequence is that the __state_t type could increase in 
>> size
>> ("could" because we can simply reduce the number of tickets in each 
>> state
>> entry). I haven't done enough research on this to know whether it is 
>> the right
>> way to go.
> 
> My understanding from Olivier Giroux, who's barrier implementation code 
> this is derived from, is that he extensively tested it's performance on 
> a wide variety of hardware, including time on Summit. Short of hard 
> data to the contrary, I am not inclined to second guess the 
> implementation he tested (which uses char tickets).
> 
> Links:
> ------
> [1] https://github.com/ogiroux/atomic_wait

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-01 17:46       ` Thomas Rodgers
@ 2021-03-01 18:00         ` Thiago Macieira
  2021-03-01 18:34           ` Thomas Rodgers
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 18:00 UTC (permalink / raw)
  To: Thomas Rodgers; +Cc: trodgers, jwakely, libstdc++

On Monday, 1 March 2021 09:46:08 PST Thomas Rodgers wrote:
> I discussed the prematureness of the contention optimization with ogiroux
> this morning. Confirming, libc++ does do this, and in his measurements it
> saves on the order of 50us.
> 
> Regarding the use of _Hash_impl::hash(), that can be changed to just use the
> pointer value itself. This is what libatomic does. I'm happy to propose
> that as a subsequent patch to the most recent version of this code which I
> submitted recently.

Is the benchmark source somewhere I can take a look at?

I'm worried that it is benchmarking the wrong thing. Can we benchmark latch 
and counting_semaphore instead? Those already track contention on the waitable 
atomic by themselves.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 17:38         ` Thomas Rodgers
  2021-03-01 17:40           ` Thomas Rodgers
@ 2021-03-01 18:06           ` Thiago Macieira
  2021-03-01 19:08             ` Thomas Rodgers
  1 sibling, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 18:06 UTC (permalink / raw)
  To: Thomas Rodgers; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches

On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote:
> > And _M_phase, despite being non-atomic, is never accessed without the
> > atomic_ref, aside from the constructor. Both arrive() and wait() start
> > off by
> > creating the atomic_ref.
> 
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?

Hey, it's your code :-)

It is using atomics to operate on the value. It's just that the type on the 
structure isn't an atomic by itself.

Why, I don't know. Olivier's original code did use atomics 
<https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
    std::atomic<ptrdiff_t>             expected_adjustment;
    std::atomic<__phase_t>             phase;


> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.

That's a reasonable request.

I'd like to see the benchmark results of using a directly-futexable type vs 
using unsigned char. But even the timing results need to be weighed against 
the increased memory use for std::barrier, which means it's not a directly-
objective conclusion. And given we *may* get 8-bit futexes, it might be worth 
keeping them this way and just tell people with large contentions to upgrade.

That of course rests on the contended atomic_wait being out-of-line.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 17:24       ` Thiago Macieira
  2021-03-01 17:38         ` Thomas Rodgers
@ 2021-03-01 18:12         ` Ville Voutilainen
  2021-03-01 19:44           ` Thiago Macieira
  1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 18:12 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches List

On Mon, 1 Mar 2021 at 20:09, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> > On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > > ints can be used in futexes. chars can't.
> >
> > Shouldn't that be an atomic type instead of a bare int then?
>
> There are a couple of atomic_refs:
>
>       using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>       using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
>
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start off by
> creating the atomic_ref.
>
> But I confess I don't understand this code sufficiently to say it is correct.
> I'm simply saying that waiting on unsigned chars will not use a futex, at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.

I do have a question about the intent/concern here, regardless of what
your patch technically
does. The ABI break _is_ your concern, and the "heisenbugs" you were
worried about would
in fact be caused by the ABI break? So if you embed these things in
your QAtomicThing, the ABI
break may mess it up(*)? Is that a correct understanding of the concern here?

(*) And inlining might not help because users might compile a
different std::atomic_thing in their
program than what your DSO has been compiled with, and you may end up
using mixtures.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-01 18:00         ` Thiago Macieira
@ 2021-03-01 18:34           ` Thomas Rodgers
  2021-03-01 19:11             ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 18:34 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Thomas Rodgers, jwakely, libstdc++



----- Original Message -----
> On Monday, 1 March 2021 09:46:08 PST Thomas Rodgers wrote:
> > I discussed the prematureness of the contention optimization with ogiroux
> > this morning. Confirming, libc++ does do this, and in his measurements it
> > saves on the order of 50us.
> > 
> > Regarding the use of _Hash_impl::hash(), that can be changed to just use
> > the
> > pointer value itself. This is what libatomic does. I'm happy to propose
> > that as a subsequent patch to the most recent version of this code which I
> > submitted recently.
> 
> Is the benchmark source somewhere I can take a look at?
> 

I'd have to ask Olivier.

> I'm worried that it is benchmarking the wrong thing. Can we benchmark latch
> and counting_semaphore instead? Those already track contention on the
> waitable
> atomic by themselves.
>

So is your concern here that e.g. latch::count_down() will do an extra atomic load?
 
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
>   Software Architect - Intel DPG Cloud Engineering
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 18:06           ` Thiago Macieira
@ 2021-03-01 19:08             ` Thomas Rodgers
  0 siblings, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 19:08 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches

On 2021-03-01 10:06, Thiago Macieira wrote:

> On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote: And 
> _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start
> off by
> creating the atomic_ref.
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?

> Hey, it's your code :-)
> 
> It is using atomics to operate on the value. It's just that the type on 
> the
> structure isn't an atomic by itself.

> Why, I don't know. Olivier's original code did use atomics
> <https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
>    std::atomic<ptrdiff_t>             expected_adjustment;
>    std::atomic<__phase_t>             phase;

It is atomic, in that it is always an atomic_ref. This change was made 
at
Jonathan's request.

> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.

> That's a reasonable request.
> 
> I'd like to see the benchmark results of using a directly-futexable 
> type vs
> using unsigned char. But even the timing results need to be weighed 
> against
> the increased memory use for std::barrier, which means it's not a 
> directly-
> objective conclusion. And given we *may* get 8-bit futexes, it might be 
> worth
> keeping them this way and just tell people with large contentions to 
> upgrade.

We may also want to introduce a central barrier type for memory 
constrained environments. I specifically
removed that from this patch because it -

1) wasn't clear how we'd go about making that decision
2) this support in GCC11 is experimental

> That of course rests on the contended atomic_wait being out-of-line.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-01 18:34           ` Thomas Rodgers
@ 2021-03-01 19:11             ` Thiago Macieira
  0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 19:11 UTC (permalink / raw)
  To: Thomas Rodgers; +Cc: Thomas Rodgers, jwakely, libstdc++

On Monday, 1 March 2021 10:34:22 PST Thomas Rodgers wrote:
> > I'm worried that it is benchmarking the wrong thing. Can we benchmark
> > latch
> > and counting_semaphore instead? Those already track contention on the
> > waitable
> > atomic by themselves.
> 
> So is your concern here that e.g. latch::count_down() will do an extra
> atomic load?

I'm concerned we're violating "Don't Pay for What You Don't Need". Waiting 
algorithms should be written to track the contention by themselves. If they 
say they want to wake or wait, the runtime shouldn't second-guess them.

Take a look at how glibc's sem_post and sem_wait are implemented. On systems 
with 64-bit atomics, they store the number of waiters in the high 32-bit; on 
systems without, they store a single bit indicating whether there's anyone 
waiting. The 64-bit code summarises to:

void acquire()
{
    auto &low = low_half_by_ref();	// handle big-endian, reinterpret_cast,
    uint64_t d = _M_counter.fetch_add(1ULL << 32);
    for (;;) {
        if ((d & SEM_VALUE_MASK) == 0) {
            low.wait(d);
            d = _M_counter.load(std::memory_order_relaxed);
        } else {
            if (_M_counter.compare_exchange_strong(d, d - 1, ...))
                break;
        }
    _M_counter.fetch_sub(1ULL << 32);
}

void release(int n = 1)
{
    auto &low = low_half_by_ref();	// handle big-endian, reinterpret_cast,
    uint64_d = _M_counter.fetch_add(n);
    if (d >> SEM_NWAITERS_SHIFT)
        if (n == 1)
            low.notify_one();
        else
            low.notify_all();  // ought to be "notify_many(n)"
}

It's even simpler in the latch, where any value different from "the last to 
arrive" is "contended". 

    _GLIBCXX_ALWAYS_INLINE void
    count_down(ptrdiff_t __update = 1)
    {
      auto const __old = __atomic_impl::fetch_sub(&_M_a,
                                    __update, memory_order::release);
      if (__old == __update)
        __atomic_impl::notify_all(&_M_a);
    }

There's no need to track contention again here. The only case where it would 
be useful is if there's only one thread operating on this latch, but if that's 
the case then why are you using a latch in the first place? And again, if 
that's an issue, then we can use the LSB or MSB to indicate that there are 
waiters waiting (we don't need a count of waiters).
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 18:12         ` Ville Voutilainen
@ 2021-03-01 19:44           ` Thiago Macieira
  2021-03-01 20:35             ` Ville Voutilainen
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 19:44 UTC (permalink / raw)
  To: Ville Voutilainen, libstdc++; +Cc: gcc-patches List

On Monday, 1 March 2021 10:12:35 PST Ville Voutilainen wrote:
> I do have a question about the intent/concern here, regardless of what
> your patch technically
> does. The ABI break _is_ your concern, and the "heisenbugs" you were
> worried about would
> in fact be caused by the ABI break? So if you embed these things in
> your QAtomicThing, the ABI
> break may mess it up(*)? Is that a correct understanding of the concern
> here?

Let me clarify. I am stating that the current implementation is inefficient 
because Linux currently only implements 32-bit futexes. With that in mind, I 
am arguing we need to change the implementation in a way that will break ABI 
in a new release.

The concern is that such a change, despite being in experimental code for 
which ABI stability has never been promised, is still troublesome. The problem 
there is that even people who have read the documentation and waited until 
2024 to write code using the feature may still be affected. This isn't about 
Qt because I have no plans to use wait and notify in inline API.

But you can see someone doing:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#else
#  error "Please upgrade your compiler & standard library"
#endif

and using <latch> in their inline code. And as you say, if they then mix DSOs 
compiled with different versions of GCC, one of them might be the old, 
experimental and binary-incompatible code. Remember that before April 2024, 
the most recent Ubuntu LTS will be 22.04, which will be released before GCC 
12, which means it will contain GCC 11.

So, wholly aside from how we fix the inefficiencies, we must decide how to 
deal with the ABI break. We can:

a) not have an ABI break in the first place, by having the above code not 
   compile with GCC until we promise ABI compatibility
b) cause a non-silent ABI break
c) accept the silent ABI break

I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be 
something like (untested!):

 _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
+    struct __waiters;
+    inline namespace __experimental
+    {
+      // from atomic_wait.cc
+      __waiters &__get_waiter(const void *);
+    }
     using __platform_wait_t = int;

     constexpr auto __atomic_spin_count_1 = 16;
@@ -187,11 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       static __waiters&
       _S_for(const void* __t)
       {
-       const unsigned char __mask = 0xf;
-       static __waiters __w[__mask + 1];
-
-       auto __key = _Hash_impl::hash(__t) & __mask;
-       return __w[__key];
+       return __get_waiter(__t);
       }
     };

That way, any DSO compiled using GCC 11 will fail to load when using GCC 12's 
libstdc++. And probably put "std::__detail::__experimental" in the 
GLIBCXX_EXPERIMENTAL ELF version so it's even clearer that it's experimental, 
thus helping Linux distributions and other binary artefact distributors 
realise they've made a mistake.

I still don't like (b) because it pushes the problem to the wrong people: the 
packagers. But it's far better than (c).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 19:44           ` Thiago Macieira
@ 2021-03-01 20:35             ` Ville Voutilainen
  2021-03-01 21:54               ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 20:35 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches List

On Mon, 1 Mar 2021 at 21:44, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But you can see someone doing:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> #  include <latch>
> #else
> #  error "Please upgrade your compiler & standard library"
> #endif
>
> and using <latch> in their inline code. And as you say, if they then mix DSOs
> compiled with different versions of GCC, one of them might be the old,
> experimental and binary-incompatible code. Remember that before April 2024,
> the most recent Ubuntu LTS will be 22.04, which will be released before GCC
> 12, which means it will contain GCC 11.
>
> So, wholly aside from how we fix the inefficiencies, we must decide how to
> deal with the ABI break. We can:
>
> a) not have an ABI break in the first place, by having the above code not
>    compile with GCC until we promise ABI compatibility
> b) cause a non-silent ABI break
> c) accept the silent ABI break
>
> I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be
> something like (untested!):

I can't make the above code work, in any reasonable manner, because
it's doing feature detection incorrectly. :)
What I can imagine doing, however, is this:

1) a published IS always bumps feature-macro values (this won't help
now, it's a future fix, we don't currently do this in WG21)
2) an implementation uses the pre-IS draft values before it deems itself stable
3) users who want stability should require the feature-testing macro
to have at least the IS value
4) adventurous users can either use the macro without checking its
value, or use at least the value that gives them
whatever shiny toy they're interested in

With that, we can keep allowing adventurous users to opt in early, and
you have a portable way to detect that
features are stable, for an implementation-defined definition of "stable".

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 20:35             ` Ville Voutilainen
@ 2021-03-01 21:54               ` Thiago Macieira
  2021-03-01 22:04                 ` Ville Voutilainen
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 21:54 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List

On Monday, 1 March 2021 12:35:05 PST Ville Voutilainen wrote:
> I can't make the above code work, in any reasonable manner, because
> it's doing feature detection incorrectly. :)
> What I can imagine doing, however, is this:
> 
> 1) a published IS always bumps feature-macro values (this won't help
> now, it's a future fix, we don't currently do this in WG21)

This is mostly already the case. I haven't seen any need to bump the values.

> 2) an implementation uses the pre-IS draft values before it deems itself
> stable 

Before the IS is stable? From what I've seen so far, the macros are always 
around the month of the latest draft of a given feature. So if an 
implementation implemented an earlier draft, the macro version would increase 
in a new draft coming before the IS.

I think we just need to be careful of updates done after the latest draft is 
adopted, usually by NBs. Those have to bump the macro too. I think you're in 
good speaking terms with the Finland NB :-)

> 3) users who want stability should require the feature-testing macro
> to have at least the IS value

True. That's not the stability I was talking about, but it could be done.

> 4) adventurous users can either use the macro without checking its
> value, or use at least the value that gives them
> whatever shiny toy they're interested in

Fair enough.

Please note I am not talking about the stability of the feature as described 
in the standard or a proposal or draft. I am talking about the stability of 
the implementation. C++20 is out and the atomic-wait feature is stable, as 
defined by the standard. What isn't stable is GCC's implementation of it.

But your suggestion does work. We don't need to apply them to all macros, only 
those that are new in a given version, like __cpp_lib_atomic_wait or 
__cpp_lib_latch in this case. Alternatively, implementations can set the macro 
to a given value below the standard-required one (incl. 1) to indicate that 
they haven't achieved stability.

That would make my check:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#endif
#if __cpp_lib_latch < 201907L
#  error "Please upgrade your Standard Library"
#endif

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 21:54               ` Thiago Macieira
@ 2021-03-01 22:04                 ` Ville Voutilainen
  2021-03-01 22:21                   ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 22:04 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches List

On Mon, 1 Mar 2021 at 23:54, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But your suggestion does work. We don't need to apply them to all macros, only
> those that are new in a given version, like __cpp_lib_atomic_wait or
> __cpp_lib_latch in this case. Alternatively, implementations can set the macro
> to a given value below the standard-required one (incl. 1) to indicate that
> they haven't achieved stability.
>
> That would make my check:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> #  include <latch>
> #endif
> #if __cpp_lib_latch < 201907L
> #  error "Please upgrade your Standard Library"
> #endif

Well, this would be different. What I'm suggesting is not quite that;
for any *new* facility, we'd make sure
that its draft macro and the final IS macro are different, but the
minimum value is the first draft version,
not anything below it. I don't care too much, that approach and yours
would work the same way. Things that already
had an IS value for a macro and haven't changed since don't need to be
changed. And we don't
need to bump all values of existing facilities either, just for those
that got changes, so some existing macros
would be considered lost causes. Like the ones we're talking about,
because the cats are already out of the
bag.

I'll write a paper. That won't help you now, but it gives us tools in
the future.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 22:04                 ` Ville Voutilainen
@ 2021-03-01 22:21                   ` Thiago Macieira
  2021-03-01 22:31                     ` Ville Voutilainen
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 22:21 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List

On Monday, 1 March 2021 14:04:34 PST Ville Voutilainen wrote:
> Well, this would be different. What I'm suggesting is not quite that;
> for any *new* facility, we'd make sure
> that its draft macro and the final IS macro are different, but the
> minimum value is the first draft version,
> not anything below it. I don't care too much, that approach and yours
> would work the same way. Things that already
> had an IS value for a macro and haven't changed since don't need to be
> changed. And we don't
> need to bump all values of existing facilities either, just for those
> that got changes, so some existing macros
> would be considered lost causes. Like the ones we're talking about,
> because the cats are already out of the
> bag.

But the code I posted, if people are careful to use write like I did, would 
allow us to have the experimental "we're not sure this is right" 
implementation of atomic waits, latches, barriers and semaphores right now.

It would simply require that we decrement the macros by 1 in the libstdc++ 
headers.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 22:21                   ` Thiago Macieira
@ 2021-03-01 22:31                     ` Ville Voutilainen
  2021-03-01 22:40                       ` Thiago Macieira
  0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 22:31 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches List

On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com> wrote:

> But the code I posted, if people are careful to use write like I did, would
> allow us to have the experimental "we're not sure this is right"
> implementation of atomic waits, latches, barriers and semaphores right now.

The code assumes that as soon as __cplusplus bumps and a header is
present, things
are stable. I don't think that's a safe assumption to make.
Furthermore, the second  #if
tries to check a feature-testing macro without including either the
corresponding header
or <version>. That doesn't work. You need to either include <version>
and check a macro,
or check that <latch> exists, then include <latch> and then check the macro.

But other than that, sure, as QoI, vendors could already provide the
standard macros with
numbers that are lower than the standard ever specified. Going
forward, if existing facilities
are changed, this stops working because now you'd have to track the
WPs to see which
values are "vendor-bogus". I find it better to just change the macros
whose facilities changed
during a standard cycle, and in those cases bump the IS macro, that'll
then work going forward.
In the meanwhile, when vendors can, they could use the technique you
describe, but that's
barely worth proposing as an SG10 guideline because they would've
needed to do it already, but didn't. :P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
  2021-03-01 22:31                     ` Ville Voutilainen
@ 2021-03-01 22:40                       ` Thiago Macieira
  0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 22:40 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List

On Monday, 1 March 2021 14:31:09 PST Ville Voutilainen wrote:
> On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com> 
wrote:
> > But the code I posted, if people are careful to use write like I did,
> > would
> > allow us to have the experimental "we're not sure this is right"
> > implementation of atomic waits, latches, barriers and semaphores right
> > now.
> 
> The code assumes that as soon as __cplusplus bumps and a header is
> present, things
> are stable.

Not exactly. Re-posting the code for reference:

#if __cplusplus >= 202002L && __has_include(<latch>)
#  include <latch>
#endif
#if __cpp_lib_latch < 201907L
#  error "Please upgrade your Standard Library"
#endif

The first section simply assumes that <latch> can be #included. The 
__cplusplus check is necessary because MSVC STL's headers are present but 
can't be #include'd otherwise (they cause #error).

It's the second check that is authoritative.

> I don't think that's a safe assumption to make.
> Furthermore, the second  #if
> tries to check a feature-testing macro without including either the
> corresponding header
> or <version>. That doesn't work. You need to either include <version>
> and check a macro,
> or check that <latch> exists, then include <latch> and then check the macro.

<version> would work, but why can't you check the macro anyway? It may trigger 
a warning with GCC's -Wundef, but it's just a warning. The preprocessor is 
defined to replace any undefined identifier with 0. If you want to avoid the 
warning, you could write:

#if defined(__cpp_lib_latch) && __cpp_lib_latch < 201907L

Is there anything I'm not seeing?

> But other than that, sure, as QoI, vendors could already provide the
> standard macros with
> numbers that are lower than the standard ever specified. Going
> forward, if existing facilities
> are changed, this stops working because now you'd have to track the
> WPs to see which
> values are "vendor-bogus". I find it better to just change the macros
> whose facilities changed
> during a standard cycle, and in those cases bump the IS macro, that'll
> then work going forward.
> In the meanwhile, when vendors can, they could use the technique you
> describe, but that's
> barely worth proposing as an SG10 guideline because they would've
> needed to do it already, but didn't. :P

I am not opposed to that. Your solution is better.

But this solution is less work on the standard and still works, even if it 
creates a little more work on the users of said features. It's not 
unsurmountable, since we often need to check which C++ standard edition it 
came with anyway. So it doesn't matter what value it assumes: we'll be 
consulting a table anyway.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-26 23:53       ` Thiago Macieira
  2021-02-26 23:58         ` Ville Voutilainen
@ 2021-03-03 14:24         ` Jonathan Wakely
  2021-03-03 17:12           ` Thiago Macieira
  1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:24 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++

On 26/02/21 15:53 -0800, Thiago Macieira via Libstdc++ wrote:
>On Friday, 26 February 2021 10:47:09 PST Ville Voutilainen wrote:
>> On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
>>
>> <libstdc++@gcc.gnu.org> wrote:
>> > Oh? That's not a really widely understood fact. I see people using C++20
>> > right now all the time in the cpplang slack and no one seems to be
>> > worried that support is experimental.
>>
>> People writing toys on forums is not real production use that we
>> should worry about, and anyone who uses
>> a two-months-old standard in production use is going to survive
>> possible breaking changes. Or if they won't,
>> that's their stupid fault.
>
>Think about this scenario. Let's say GCC 12 declares to C++20 support to be
>non-experimental.
>
>Then someone writes a production tool that does:
>
>#if __cpp_lib_atomic_wait >= 201907L
>    a.wait(n);
>#else
>    fallback(a);
>#endif

I agree this is a problem. One created by the feature test macros
(because bfore they were added, you just had to figure out which
version was OK to use for yourself and test __GNUC__ or similar).

But it's not a new problem for GCC 11, and not specific to the atomic
wait feature. It's worth further consideration though.



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-02-27  0:44                 ` Ville Voutilainen
  2021-02-27  0:53                   ` Thiago Macieira
@ 2021-03-03 14:30                   ` Jonathan Wakely
  2021-03-03 17:07                     ` Thiago Macieira
  1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:30 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Thiago Macieira, Thomas Rodgers, libstdc++

On 27/02/21 02:44 +0200, Ville Voutilainen via Libstdc++ wrote:
>On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com> wrote:
>> Those goals seem incompatible. If the feature is experimental and not to be
>> used in production, then we don't want users to adopt the facilities early. We
>> want them to wait until we declare that it's no longer experimental.
>
>Well, we want *bold* users to adopt new facilities early. Not all users.

Yes, exactly. They could still do that if some additional option/macro
was needed to get to those new facilities though.

My concern is that if we add a new "I know what I'm doing" option,
everybody defines it (because that's what blog posts and youtube
videos and conference talks tell them to do, because that gives them
the new shiny things) and then it just becomes a default.

Maybe Qt wouldn't define it, but plenty would.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
                     ` (4 preceding siblings ...)
  2021-02-26 18:14   ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
@ 2021-03-03 14:34   ` Jonathan Wakely
  2021-03-03 17:14     ` Thiago Macieira
  5 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:34 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.

Yes, we should do this for GCC 11.


> libstdc++-v3/include/std/latch | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
>index ef8c301e5e9..156aea5c5e5 100644
>--- a/libstdc++-v3/include/std/latch
>+++ b/libstdc++-v3/include/std/latch
>@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   public:
>     static constexpr ptrdiff_t
>     max() noexcept
>-    { return __gnu_cxx::__int_traits<ptrdiff_t>::__max; }
>+    { return __gnu_cxx::__int_traits<int>::__max; }
>
>     constexpr explicit latch(ptrdiff_t __expected) noexcept
>       : _M_a(__expected) { }
>@@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     }
>
>   private:
>-    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>+    alignas(__alignof__(int)) int _M_a;
>   };
> _GLIBCXX_END_NAMESPACE_VERSION
> } // namespace
>-- 
>2.30.1
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-02-26 15:59   ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
@ 2021-03-03 14:34     ` Jonathan Wakely
  2021-03-03 16:21       ` Jonathan Wakely
  0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:34 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>The kernel doesn't care what we store in those 32 bits, only that they are
>comparable. This commit adds:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
>
>We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
>this field but has only used 6 values so far, so it can be extended to
>unsigned compares in the future, if needed.

And this one for GCC 11 too.

> libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
>index 424fccbe4c5..1c6bda2e2b6 100644
>--- a/libstdc++-v3/include/bits/atomic_wait.h
>+++ b/libstdc++-v3/include/bits/atomic_wait.h
>@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     template<typename _Tp>
>       inline constexpr bool __platform_wait_uses_type
> #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-	= is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
>+	= is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)
>+	  && alignof(_Tp) >= alignof(__platform_wait_t);
> #else
> 	= false;
> #endif
>@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>     template<typename _Tp>
>       void
>-      __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
>+      __platform_wait(const _Tp* __addr, _Tp __val) noexcept
>       {
> 	for(;;)
> 	  {
> 	    auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
> 				  static_cast<int>(__futex_wait_flags::__wait_private),
>-				    __val, nullptr);
>+				  static_cast<__platform_wait_t>(__val), nullptr);
> 	    if (!__e || errno == EAGAIN)
> 	      break;
> 	    else if (errno != EINTR)
>-- 
>2.30.1
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/5] barrier: optimise by not having the hasher in a loop
  2021-02-26 15:59   ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
@ 2021-03-03 14:36     ` Jonathan Wakely
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:36 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>Our thread's ID does not change so we don't have to get it every time
>and hash it every time.

This looks good for GCC 11.

(This one wouldn't be an ABI break to change later, but we might as
well do it now, as we have the patch now).


> libstdc++-v3/include/std/barrier | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
>index ae058bd3dc3..eb31a89b175 100644
>--- a/libstdc++-v3/include/std/barrier
>+++ b/libstdc++-v3/include/std/barrier
>@@ -101,12 +101,10 @@ It looks different from literature pseudocode for two main reasons:
>       }
>
>       bool
>-      _M_arrive(__barrier_phase_t __old_phase)
>+      _M_arrive(__barrier_phase_t __old_phase, size_t __current)
>       {
> 	size_t __current_expected = _M_expected;
>-	std::hash<std::thread::id> __hasher;
>-	size_t __current = __hasher(std::this_thread::get_id())
>-					  % ((_M_expected + 1) >> 1);
>+        __current %= ((__current_expected + 1) >> 1);
>
> 	const auto __half_step = _S_add_to_phase(__old_phase, 1);
> 	const auto __full_step = _S_add_to_phase(__old_phase, 2);
>@@ -167,11 +165,13 @@ It looks different from literature pseudocode for two main reasons:
>       [[nodiscard]] arrival_token
>       arrive(ptrdiff_t __update)
>       {
>+	std::hash<std::thread::id> __hasher;
>+	size_t __current = __hasher(std::this_thread::get_id());
> 	__atomic_phase_ref_t __phase(_M_phase);
> 	const auto __old_phase = __phase.load(memory_order_relaxed);
> 	for(; __update; --__update)
> 	  {
>-	    if(_M_arrive(__old_phase))
>+	    if(_M_arrive(__old_phase, __current))
> 	      {
> 		_M_completion();
> 		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
>-- 
>2.30.1
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-01  8:56             ` Richard Biener
@ 2021-03-03 14:56               ` Jonathan Wakely
  2021-03-03 15:02                 ` Andreas Schwab
                                   ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Hans-Peter Nilsson, Thiago Macieira, Andreas Schwab, libstdc++,
	GCC Patches

On 01/03/21 09:56 +0100, Richard Biener via Libstdc++ wrote:
>On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>>
>>
>>
>> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>>
>> > On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
>> > > On Feb 26 2021, Thiago Macieira wrote:
>> > > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>> > > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>> > > >> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>> > > >> > +    alignas(__alignof__(int)) int _M_a;
>> > > >>
>> > > >> Futexes must be aligned to 4 bytes.
>> > > >
>> > > > Agreed, but doesn't this accomplish that?
>> > >
>> > > No.  It uses whatever alignment the type already has, and is an
>> > > elaborate no-op.
>> >
>> > I thought so too when I read the original line. But I expected it was written
>> > like that for a reason, especially since the same pattern appears in other
>> > places.
>> >
>> > I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
>> > the correct solution?
>>
>> IMNSHO make use of the corresponding atomic type.  Then there'd
>> be no need for separate what's-the-right-align-curse games.

That won't work though, because we need direct access to the integer
object, not to a std::atomic<int> which contains it.

>Or align as the corresponding atomic type (in case using an actual
>std::atomic<int> is undesirable).  OTOH the proposed code isn't
>any more bogus than the previous ...

The previous code accounts for the fact that ptrdiff_t is a typedef
for an unspecified type, and that some ABIs allow struct members to have
weaker alignment than they would have otherwise.

e.g. __alignof__(long long) != alignof(long long) on x86.

Yes, I know ptrdiff_t isn't long long on x86, but since it's a typedef
for some target-specific type, it's still possible that
alignof != __alignof__. So alignas(__alignof__(T)) is not necessarily
a no-op. So not bogus.

For int, there shouldn't be any need to force the alignment. I don't
think any ABI supported by GCC allows int members to be aligned to
less than __alignof__(int). Users could break it by compiling with
#pragma pack, but I have no sympathy for such silliness.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-03 14:56               ` Jonathan Wakely
@ 2021-03-03 15:02                 ` Andreas Schwab
  2021-03-03 15:10                 ` Jonathan Wakely
  2021-03-03 15:37                 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 73+ messages in thread
From: Andreas Schwab @ 2021-03-03 15:02 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Biener, Hans-Peter Nilsson, Thiago Macieira, libstdc++,
	GCC Patches

On Mär 03 2021, Jonathan Wakely wrote:

> For int, there shouldn't be any need to force the alignment. I don't
> think any ABI supported by GCC allows int members to be aligned to
> less than __alignof__(int).

There is no requirement that __alignof__(int) is big enough.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-03 14:56               ` Jonathan Wakely
  2021-03-03 15:02                 ` Andreas Schwab
@ 2021-03-03 15:10                 ` Jonathan Wakely
  2021-03-03 15:37                 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 15:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: Hans-Peter Nilsson, Thiago Macieira, Andreas Schwab, libstdc++,
	GCC Patches

On 03/03/21 14:56 +0000, Jonathan Wakely wrote:
>On 01/03/21 09:56 +0100, Richard Biener via Libstdc++ wrote:
>>On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>>>
>>>
>>>
>>>On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>>>
>>>> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
>>>> > On Feb 26 2021, Thiago Macieira wrote:
>>>> > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>>>> > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>>>> > >> > -    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>>>> > >> > +    alignas(__alignof__(int)) int _M_a;
>>>> > >>
>>>> > >> Futexes must be aligned to 4 bytes.
>>>> > >
>>>> > > Agreed, but doesn't this accomplish that?
>>>> >
>>>> > No.  It uses whatever alignment the type already has, and is an
>>>> > elaborate no-op.
>>>>
>>>> I thought so too when I read the original line. But I expected it was written
>>>> like that for a reason, especially since the same pattern appears in other
>>>> places.
>>>>
>>>> I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
>>>> the correct solution?

It's not a GCC extensions. You're thinking of alignas(obj) which is a
GCC extension.

>>>IMNSHO make use of the corresponding atomic type.  Then there'd
>>>be no need for separate what's-the-right-align-curse games.
>
>That won't work though, because we need direct access to the integer
>object, not to a std::atomic<int> which contains it.
>
>>Or align as the corresponding atomic type (in case using an actual
>>std::atomic<int> is undesirable).  OTOH the proposed code isn't
>>any more bogus than the previous ...
>
>The previous code accounts for the fact that ptrdiff_t is a typedef
>for an unspecified type, and that some ABIs allow struct members to have
>weaker alignment than they would have otherwise.
>
>e.g. __alignof__(long long) != alignof(long long) on x86.
>
>Yes, I know ptrdiff_t isn't long long on x86, but since it's a typedef
>for some target-specific type, it's still possible that
>alignof != __alignof__. So alignas(__alignof__(T)) is not necessarily
>a no-op. So not bogus.
>
>For int, there shouldn't be any need to force the alignment. I don't
>think any ABI supported by GCC allows int members to be aligned to
>less than __alignof__(int). Users could break it by compiling with
>#pragma pack, but I have no sympathy for such silliness.

Jakub said on IRC that m68k might have alignof(int) == 2, so we'd need
to increase that alignment to use it as a futex.

N.B. std::atomic and std::atomic_ref don't use alignas(__alignof__(T))
they do this instead:

       static_assert(is_trivially_copyable_v<_Tp>);

       // 1/2/4/8/16-byte types must be aligned to at least their size.
       static constexpr int _S_min_alignment
	= (sizeof(_Tp) & (sizeof(_Tp) - 1)) || sizeof(_Tp) > 16
	? 0 : sizeof(_Tp);

     public:

       static constexpr size_t required_alignment
	= _S_min_alignment > alignof(_Tp) ? _S_min_alignment : alignof(_Tp);

Using std::atomic_ref<T>::required_alignment would give that value.

For something being used as a futex we should just use alignas(4)
though, since that's what the kernel requires.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-03 14:56               ` Jonathan Wakely
  2021-03-03 15:02                 ` Andreas Schwab
  2021-03-03 15:10                 ` Jonathan Wakely
@ 2021-03-03 15:37                 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-03-03 15:37 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Biener, Thiago Macieira, Andreas Schwab, libstdc++, GCC Patches

On Wed, 3 Mar 2021, Jonathan Wakely wrote:
> For int, there shouldn't be any need to force the alignment. I don't
> think any ABI supported by GCC allows int members to be aligned to
> less than __alignof__(int).

(sizeof(int) last)

The CRIS ABI does as in default packed, and ISTR there was some
other gcc port too, but I don't recall whether that (too) had no
(current) Linux port.

brgds, H-P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-03-03 14:34     ` Jonathan Wakely
@ 2021-03-03 16:21       ` Jonathan Wakely
  2021-03-03 17:27         ` Thiago Macieira
  2021-03-03 17:34         ` Ville Voutilainen
  0 siblings, 2 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 16:21 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On 03/03/21 14:34 +0000, Jonathan Wakely wrote:
>On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>>The kernel doesn't care what we store in those 32 bits, only that they are
>>comparable. This commit adds:
>>* pointers and long on 32-bit architectures
>>* unsigned
>>* untyped enums and typed enums on int & unsigned int
>>* float
>>
>>We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
>>this field but has only used 6 values so far, so it can be extended to
>>unsigned compares in the future, if needed.
>
>And this one for GCC 11 too.
>
>>libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
>>1 file changed, 4 insertions(+), 3 deletions(-)
>>
>>diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
>>index 424fccbe4c5..1c6bda2e2b6 100644
>>--- a/libstdc++-v3/include/bits/atomic_wait.h
>>+++ b/libstdc++-v3/include/bits/atomic_wait.h
>>@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>    template<typename _Tp>
>>      inline constexpr bool __platform_wait_uses_type
>>#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>>-	= is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
>>+	= is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)

Oh, except that is_scalar is surprisingly expensive to instantiate
(its defined in a really expensive way) and since we control all uses
of this constant, I don't think we need it. It's only ever used from
atomic waiting functions which are only defined for scalar types.

>>+	  && alignof(_Tp) >= alignof(__platform_wait_t);
>>#else
>>	= false;
>>#endif
>>@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>    template<typename _Tp>
>>      void
>>-      __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
>>+      __platform_wait(const _Tp* __addr, _Tp __val) noexcept
>>      {
>>	for(;;)
>>	  {
>>	    auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
>>				  static_cast<int>(__futex_wait_flags::__wait_private),
>>-				    __val, nullptr);
>>+				  static_cast<__platform_wait_t>(__val), nullptr);
>>	    if (!__e || errno == EAGAIN)
>>	      break;
>>	    else if (errno != EINTR)
>>-- 
>>2.30.1
>>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-03 14:30                   ` Jonathan Wakely
@ 2021-03-03 17:07                     ` Thiago Macieira
  2021-03-03 17:14                       ` Jonathan Wakely
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:07 UTC (permalink / raw)
  To: Ville Voutilainen, Jonathan Wakely; +Cc: Thomas Rodgers, libstdc++

On Wednesday, 3 March 2021 06:30:20 PST Jonathan Wakely wrote:
> Yes, exactly. They could still do that if some additional option/macro
> was needed to get to those new facilities though.
> 
> My concern is that if we add a new "I know what I'm doing" option,
> everybody defines it (because that's what blog posts and youtube
> videos and conference talks tell them to do, because that gives them
> the new shiny things) and then it just becomes a default.
> 
> Maybe Qt wouldn't define it, but plenty would.

Well, if you define the "void warranty" option and you break things, you get 
to keep both pieces.

Another option is to fiddle with the include path. 

$ gcc -v -E -xc++ /dev/null
Using built-in specs.
[...]
#include <...> search starts here:
 /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10
 /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/x86_64-
generic-linux
 /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/backward
 /usr/lib64/gcc/x86_64-generic-linux/10/include
 /usr/local/include
 /usr/lib64/gcc/x86_64-generic-linux/10/include-fixed
 /usr/include

That "backward" has always got me thinking. What if we had a command-line 
option like -fexperimental-c++ option that added the $CXXROOT/experimental 
(complete with its bits/ and $ARCH subdirs)? That way, the regular 
atomic_base.h could do:

#if __cplusplus > 201703L && __has_include(<bits/atomic_wait.h>)

And the second conditional would fail unless the option is provided.

The advantage of a compiler option is that you need each DSO to consciously 
add something to their build systems and that can't be done by header-only 
libraries. That's like MSVC's /std:c++latest, which is documented by MS as 
"here there be dragons".

GCC can also generate some extra bit of assembly to mark the final executable 
as using experimental API, so distributors can easily tell that unfinished API 
was used. I've successfully done this for Qt using ELF versions.

$ rpm -qR kwin5 | grep PRIVATE
libQt5Core.so.5(Qt_5.15.2_PRIVATE_API)(64bit)
libQt5Gui.so.5(Qt_5.15.2_PRIVATE_API)(64bit)

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-03 14:24         ` Jonathan Wakely
@ 2021-03-03 17:12           ` Thiago Macieira
  0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:12 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++

On Wednesday, 3 March 2021 06:24:45 PST Jonathan Wakely wrote:
> But it's not a new problem for GCC 11, and not specific to the atomic
> wait feature. It's worth further consideration though.

It isn't, except that in this case we know, before release, that ABI will 
silently break in ways that may cause heisenbugs and deadlocks. Given that we 
know that, I'm asking for a conscious decision on one of three options:

1) make the functionality unavailable (by default or at all)
2) make the ABI break non-silent when it comes
3) accept the silent ABI break

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-03 14:34   ` Jonathan Wakely
@ 2021-03-03 17:14     ` Thiago Macieira
  2021-03-03 17:18       ` Jonathan Wakely
  0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:14 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Wednesday, 3 March 2021 06:34:26 PST Jonathan Wakely wrote:
> On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
> >ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
> 
> Yes, we should do this for GCC 11.

Want me to re-submit this one alone, with the "alignas(4)" with a commend 
indicating it's what the kernel requires?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: C++2a synchronisation inefficient in GCC 11
  2021-03-03 17:07                     ` Thiago Macieira
@ 2021-03-03 17:14                       ` Jonathan Wakely
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:14 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++

On 03/03/21 09:07 -0800, Thiago Macieira via Libstdc++ wrote:
>On Wednesday, 3 March 2021 06:30:20 PST Jonathan Wakely wrote:
>> Yes, exactly. They could still do that if some additional option/macro
>> was needed to get to those new facilities though.
>>
>> My concern is that if we add a new "I know what I'm doing" option,
>> everybody defines it (because that's what blog posts and youtube
>> videos and conference talks tell them to do, because that gives them
>> the new shiny things) and then it just becomes a default.
>>
>> Maybe Qt wouldn't define it, but plenty would.
>
>Well, if you define the "void warranty" option and you break things, you get
>to keep both pieces.
>
>Another option is to fiddle with the include path.
>
>$ gcc -v -E -xc++ /dev/null
>Using built-in specs.
>[...]
>#include <...> search starts here:
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/x86_64-
>generic-linux
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/backward
> /usr/lib64/gcc/x86_64-generic-linux/10/include
> /usr/local/include
> /usr/lib64/gcc/x86_64-generic-linux/10/include-fixed
> /usr/include
>
>That "backward" has always got me thinking. What if we had a command-line
>option like -fexperimental-c++ option that added the $CXXROOT/experimental

We already have an experimental sub-dir for TS headers, so you need to
pick another name. Otherwise <vector> and <experimental/vector> get
mixed up.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
  2021-03-03 17:14     ` Thiago Macieira
@ 2021-03-03 17:18       ` Jonathan Wakely
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:18 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: libstdc++, gcc-patches

On 03/03/21 09:14 -0800, Thiago Macieira via Libstdc++ wrote:
>On Wednesday, 3 March 2021 06:34:26 PST Jonathan Wakely wrote:
>> On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>> >ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
>>
>> Yes, we should do this for GCC 11.
>
>Want me to re-submit this one alone, with the "alignas(4)" with a commend
>indicating it's what the kernel requires?

Yes please.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-03-03 16:21       ` Jonathan Wakely
@ 2021-03-03 17:27         ` Thiago Macieira
  2021-03-03 17:34         ` Ville Voutilainen
  1 sibling, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:27 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Wednesday, 3 March 2021 08:21:51 PST Jonathan Wakely wrote:
> >>-       = is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
> >>+       = is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) ==
> >>sizeof(__platform_wait_t)
> Oh, except that is_scalar is surprisingly expensive to instantiate
> (its defined in a really expensive way) and since we control all uses
> of this constant, I don't think we need it. It's only ever used from
> atomic waiting functions which are only defined for scalar types.

Thanks. Will update and re-submit.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-03-03 16:21       ` Jonathan Wakely
  2021-03-03 17:27         ` Thiago Macieira
@ 2021-03-03 17:34         ` Ville Voutilainen
  2021-03-03 17:41           ` Jonathan Wakely
  1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-03 17:34 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Thiago Macieira, libstdc++, gcc-patches List

On Wed, 3 Mar 2021 at 19:25, Jonathan Wakely via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> Oh, except that is_scalar is surprisingly expensive to instantiate
> (its defined in a really expensive way) and since we control all uses

I'll be more than happy to write you an __is_scalar for GCC 12. :P

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
  2021-03-03 17:34         ` Ville Voutilainen
@ 2021-03-03 17:41           ` Jonathan Wakely
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:41 UTC (permalink / raw)
  To: Ville Voutilainen; +Cc: Thiago Macieira, libstdc++, gcc-patches List

On 03/03/21 19:34 +0200, Ville Voutilainen wrote:
>On Wed, 3 Mar 2021 at 19:25, Jonathan Wakely via Libstdc++
><libstdc++@gcc.gnu.org> wrote:
>> Oh, except that is_scalar is surprisingly expensive to instantiate
>> (its defined in a really expensive way) and since we control all uses
>
>I'll be more than happy to write you an __is_scalar for GCC 12. :P

That would help with https://gcc.gnu.org/PR96710 too. The current
implementation is inefficient and arguably wrong.



^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2021-03-03 17:41 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
2021-02-26 11:19 ` Jonathan Wakely
2021-02-26 17:37   ` Thiago Macieira
2021-02-26 18:29     ` Jonathan Wakely
2021-02-26 19:30       ` Ville Voutilainen
2021-02-26 21:17         ` Jonathan Wakely
2021-02-26 21:18           ` Ville Voutilainen
2021-02-26 21:39             ` Jonathan Wakely
2021-02-26 18:47     ` Ville Voutilainen
2021-02-26 23:53       ` Thiago Macieira
2021-02-26 23:58         ` Ville Voutilainen
2021-02-27  0:11           ` Thiago Macieira
2021-02-27  0:18             ` Ville Voutilainen
2021-02-27  0:36               ` Thiago Macieira
2021-02-27  0:44                 ` Ville Voutilainen
2021-02-27  0:53                   ` Thiago Macieira
2021-02-27  1:03                     ` Ville Voutilainen
2021-03-03 14:30                   ` Jonathan Wakely
2021-03-03 17:07                     ` Thiago Macieira
2021-03-03 17:14                       ` Jonathan Wakely
2021-02-27  0:22             ` Marc Glisse
2021-02-27  0:30               ` Ville Voutilainen
2021-02-27  0:43               ` Thiago Macieira
2021-03-03 14:24         ` Jonathan Wakely
2021-03-03 17:12           ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
2021-02-26 15:59   ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
2021-03-03 14:34     ` Jonathan Wakely
2021-03-03 16:21       ` Jonathan Wakely
2021-03-03 17:27         ` Thiago Macieira
2021-03-03 17:34         ` Ville Voutilainen
2021-03-03 17:41           ` Jonathan Wakely
2021-02-26 15:59   ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
2021-02-26 15:59   ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
2021-02-28 15:05     ` Hans-Peter Nilsson
2021-03-01 16:28       ` Thomas Rodgers
2021-03-01 17:24       ` Thiago Macieira
2021-03-01 17:38         ` Thomas Rodgers
2021-03-01 17:40           ` Thomas Rodgers
2021-03-01 18:06           ` Thiago Macieira
2021-03-01 19:08             ` Thomas Rodgers
2021-03-01 18:12         ` Ville Voutilainen
2021-03-01 19:44           ` Thiago Macieira
2021-03-01 20:35             ` Ville Voutilainen
2021-03-01 21:54               ` Thiago Macieira
2021-03-01 22:04                 ` Ville Voutilainen
2021-03-01 22:21                   ` Thiago Macieira
2021-03-01 22:31                     ` Ville Voutilainen
2021-03-01 22:40                       ` Thiago Macieira
2021-02-26 15:59   ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
2021-03-03 14:36     ` Jonathan Wakely
2021-02-26 18:14   ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
2021-02-26 19:08     ` Thiago Macieira
2021-02-26 19:31       ` Andreas Schwab
2021-02-27  0:13         ` Thiago Macieira
2021-02-28 21:31           ` Hans-Peter Nilsson
2021-03-01  8:56             ` Richard Biener
2021-03-03 14:56               ` Jonathan Wakely
2021-03-03 15:02                 ` Andreas Schwab
2021-03-03 15:10                 ` Jonathan Wakely
2021-03-03 15:37                 ` Hans-Peter Nilsson
2021-03-01 16:32             ` Thomas Rodgers
2021-03-03 14:34   ` Jonathan Wakely
2021-03-03 17:14     ` Thiago Macieira
2021-03-03 17:18       ` Jonathan Wakely
2021-02-27  1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
2021-02-27  1:29   ` Thomas Rodgers
2021-02-27  3:01     ` Thomas Rodgers
2021-03-01 17:46       ` Thomas Rodgers
2021-03-01 18:00         ` Thiago Macieira
2021-03-01 18:34           ` Thomas Rodgers
2021-03-01 19:11             ` Thiago Macieira
2021-02-27  2:02   ` Ville Voutilainen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).