* C++2a synchronisation inefficient in GCC 11
@ 2021-02-25 22:50 Thiago Macieira
2021-02-26 11:19 ` Jonathan Wakely
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-25 22:50 UTC (permalink / raw)
To: libstdc++; +Cc: trodgers, jwakely
Hello all
I was investigating the implementation of std::atomic<T>::wait to use inside
my own code (Qt's QMutex and QSemaphore), when I realised that it has
huge inefficiencies. Since everything in this implementation is inline and,
once released, it will tie our hands until the next ABI break (libstdc++.so.
7). And no, inline namespaces and ABI tags are no different than ABI breaks,
they only make the ABI break more or less silent in the process.
Here's a summary of the findings:
1) everything is inline
2) futex code is still behind a lot of code calling into std::_Hash_bytes
3) other int-sized (incl. enums) atomics don't use futex
4) std::latch and std::counting_semaphore defaults preclude from using
futex on Linux
5) std::barrier implementation also uses a type that futex(2) can't handle
Here's what the code currently looks like: https://gcc.godbolt.org/z/MeP8rr
Given how far we are in the release cycle, a simple solution is to simply
disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
__cpp_lib_atomic_wait isn't defined, the rest of the functionality cascades
disabled too. Then we can spend however long is necessary to get a good
implementation for GCC 12.
I have patches for all but the first issue, but they are a result of only 3
hours of hacking to prove the concept. I will post them, but I do not expect
them to get merged as-is.
Details:
1) everything is inline
I think this is unnecessary optimisation, especially since it expands to a LOT
of code (see link above). A cursory research into libc++ and MS STL shows that
the actual wait and notification are out-of-line. I recommend we do the same
for libstdc++: if we're going to syscall anyway, the extra function call isn't
going to be the performance bottleneck.
In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>, Jonathan
wrote:
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.
But that has never happened.
Having out-of-line implementations would allow the implementation for generic-
sized atomic wait/notify to fine-tune as time and experience progresses. The
std::__detail::__waiter implementation looks clever, but it's unclear if it is
the best for all workloads. Moreover, the implementation is limited to 16
waiters.
Moreover, it would allow us to have support in the future for a 64-bit futex.
Linus did shoot down any need for it 13 years ago, but the world has changed
since and it's not inconceivable to have a pointer-sized futex on every
platform, which on 64-bit would be 64-bit. Windows and Darwin do have support
for 64-bit compare-and-wait functionality.
And it would allow us to have futex-equivalent support in other OSes, added at
our leisure. The current implementation does not support Windows's
WaitOnAddress and could never support it if released in the current state.
It's also possible other OSes will eventually add equivalent functionality
because of the C++ standard. For example, FreeBSD already has futex() support
inside its kernel in order to run Linux binaries, but it is not exposed to the
FreeBSD-personality syscall ABI.
Inlining certain functionality at a later date is possible. De-inlining isn't.
2) futex code for int is still behind a lot of boiler plate
It comes from the creation/use of std::__detail::__waiter in
std::__atomic_wait and __atomic_notify. That does allow one to detect whether
there is contention in the first place and avoid the system call, but I think
that's premature optimisation.
In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>, Thomas
says:
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are no
> waiters.
That's true, but it ignores the fact that users of std::atomic<T>::wait() are
likely to track contention by themselves on the value of the atomic (I know I
do). libc++, MS STL and the original atomic_wait[1] implementation operate on
direct system calls and don't try to track contention. Because of that,
libstdc++ adding an extra layer of tracking is double book-keeping and
violates "don't pay for what you don't need".
[1] https://github.com/ogiroux/atomic_wait
3) other int-sized atomics don't use futex
The kernel doesn't care what we store in those 32 bits, only that they are
comparable (FUTEX_OP does provide less-than and greater-than operations that
are signed, but we don't use that and the kernel API can be later extended to
unsigned compares). Therefore, I would recommend we extend futex support to
all 32-bit signed scalar types. This would add:
* pointers and long on 32-bit architectures
* unsigned
* untyped enums and typed enums on int & unsigned int
* float
4) std::latch and std::counting_semaphore defaults too high
The standard requires a ptrdiff_t in the API, but it does allow us to set hard
limits and/or defaults for best performance. Right now, std::latch is
implemented using ptrdiff_t internally and std::counting_semaphore's default
limit uses PTRDIFF_MAX:
template<ptrdiff_t __least_max_value =
__gnu_cxx::__int_traits<ptrdiff_t>::__max>
class counting_semaphore
As a consequence, those two classes will use 64-bit atomics on 64-bit
architectures, which means they will not use futex. I recommend changing
std::latch's internal to int and lowering the default maximum in
std::counting_semaphore to INT_MAX.
5) std::barrier uses a type that futex can't handle
The barrier phase is the actual type that gets waited / notified on, but it is
an enum class on unsigned char:
enum class __barrier_phase_t : unsigned char { };
This is a harder problem to solve, but I really do recommend making it 32-bit
in width. Whether each barrier phase holds only 256 values or the full 32
bits, I don't know (I've implemented with wrapping on 256).
What is of consequence is that the __state_t type could increase in size
("could" because we can simply reduce the number of tickets in each state
entry). I haven't done enough research on this to know whether it is the right
way to go.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
@ 2021-02-26 11:19 ` Jonathan Wakely
2021-02-26 17:37 ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
2021-02-27 1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
2 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 11:19 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, Thomas Rodgers, Jonathan Wakely
On Thu, 25 Feb 2021 at 23:28, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> Hello all
>
> I was investigating the implementation of std::atomic<T>::wait to use inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline and,
> once released, it will tie our hands until the next ABI break (libstdc++.so.
That's not true. By not committing to exporting these symbols from
libstdc++.so now, we can move them into the library later once the
design is stable.
C++20 support is experimental and not stable. If you compile it with
GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
So for GCC 12 (or a later version) we could replace the inline
functions with non-inline code in libstdc++.so and commit to that
being stable.
So actually this gives us more freedom to change it later.
> 7). And no, inline namespaces and ABI tags are no different than ABI breaks,
> they only make the ABI break more or less silent in the process.
>
> Here's a summary of the findings:
>
> 1) everything is inline
> 2) futex code is still behind a lot of code calling into std::_Hash_bytes
> 3) other int-sized (incl. enums) atomics don't use futex
Yup, including unsigned ints, and longs on ILP32 targets.
> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
Yes, I've suggested that we should use INT_MAX (or maybe UINT_MAX) as
the default, so they fit in the futex range.
I'm not working today, so I'll read the rest and respond to the test
another time.
tl;dr this code is experimental and expected to change, so is inline
for a reason.
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
2021-02-26 11:19 ` Jonathan Wakely
@ 2021-02-26 15:59 ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
` (5 more replies)
2021-02-27 1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
2 siblings, 6 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
---
libstdc++-v3/include/std/latch | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index ef8c301e5e9..156aea5c5e5 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
public:
static constexpr ptrdiff_t
max() noexcept
- { return __gnu_cxx::__int_traits<ptrdiff_t>::__max; }
+ { return __gnu_cxx::__int_traits<int>::__max; }
constexpr explicit latch(ptrdiff_t __expected) noexcept
: _M_a(__expected) { }
@@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
private:
- alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
+ alignas(__alignof__(int)) int _M_a;
};
_GLIBCXX_END_NAMESPACE_VERSION
} // namespace
--
2.30.1
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
@ 2021-02-26 15:59 ` Thiago Macieira
2021-03-03 14:34 ` Jonathan Wakely
2021-02-26 15:59 ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
` (4 subsequent siblings)
5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
The kernel doesn't care what we store in those 32 bits, only that they are
comparable. This commit adds:
* pointers and long on 32-bit architectures
* unsigned
* untyped enums and typed enums on int & unsigned int
* float
We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
this field but has only used 6 values so far, so it can be extended to
unsigned compares in the future, if needed.
---
libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 424fccbe4c5..1c6bda2e2b6 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template<typename _Tp>
inline constexpr bool __platform_wait_uses_type
#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
- = is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
+ = is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)
+ && alignof(_Tp) >= alignof(__platform_wait_t);
#else
= false;
#endif
@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template<typename _Tp>
void
- __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
+ __platform_wait(const _Tp* __addr, _Tp __val) noexcept
{
for(;;)
{
auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
static_cast<int>(__futex_wait_flags::__wait_private),
- __val, nullptr);
+ static_cast<__platform_wait_t>(__val), nullptr);
if (!__e || errno == EAGAIN)
break;
else if (errno != EINTR)
--
2.30.1
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
2021-02-26 15:59 ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
@ 2021-02-26 15:59 ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
` (3 subsequent siblings)
5 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
That violates "don't pay for what you don't need" rule of C++. Users of
std::atomic<T>::wait will often have some bit in their atomic indicate
whether the value is contended or not, so we don't need libstdc++ to do
double book-keeping for us.
---
libstdc++-v3/include/bits/atomic_wait.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 1c6bda2e2b6..4d240f44faf 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -258,14 +258,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (std::__atomic_spin(__pred))
return;
- __waiter __w(__addr);
- while (!__pred())
+ if constexpr (__platform_wait_uses_type<_Tp>)
{
- if constexpr (__platform_wait_uses_type<_Tp>)
- {
- __platform_wait(__addr, __old);
- }
- else
+ __platform_wait(__addr, __old);
+ }
+ else
+ {
+ __waiter __w(__addr);
+ while (!__pred())
{
// TODO support timed backoff when this can be moved into the lib
__w._M_do_wait();
@@ -274,13 +274,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
template<typename _Tp>
- void
+ inline void
__atomic_notify(const _Tp* __addr, bool __all) noexcept
{
using namespace __detail;
- auto& __w = __waiters::_S_for((void*)__addr);
- if (!__w._M_waiting())
- return;
#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
if constexpr (__platform_wait_uses_type<_Tp>)
@@ -290,6 +287,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
else
#endif
{
+ auto& __w = __waiters::_S_for((void*)__addr);
+ if (!__w._M_waiting())
+ return;
__w._M_notify(__all);
}
}
--
2.30.1
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
2021-02-26 15:59 ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
2021-02-26 15:59 ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
@ 2021-02-26 15:59 ` Thiago Macieira
2021-02-28 15:05 ` Hans-Peter Nilsson
2021-02-26 15:59 ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
` (2 subsequent siblings)
5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
ints can be used in futexes. chars can't.
---
libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index e09212dfcb9..ae058bd3dc3 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -70,7 +70,7 @@ It looks different from literature pseudocode for two main reasons:
*/
- enum class __barrier_phase_t : unsigned char { };
+ enum class __barrier_phase_t : int { };
template<typename _CompletionF>
class __tree_barrier
@@ -93,20 +93,24 @@ It looks different from literature pseudocode for two main reasons:
alignas(__phase_alignment) __barrier_phase_t _M_phase;
+ static __barrier_phase_t
+ _S_add_to_phase(__barrier_phase_t __phase, unsigned char __n)
+ {
+ __n += static_cast<unsigned char>(__phase);
+ return static_cast<__barrier_phase_t>(__n);
+ }
+
bool
_M_arrive(__barrier_phase_t __old_phase)
{
- const auto __old_phase_val = static_cast<unsigned char>(__old_phase);
- const auto __half_step =
- static_cast<__barrier_phase_t>(__old_phase_val + 1);
- const auto __full_step =
- static_cast<__barrier_phase_t>(__old_phase_val + 2);
-
size_t __current_expected = _M_expected;
std::hash<std::thread::id> __hasher;
size_t __current = __hasher(std::this_thread::get_id())
% ((_M_expected + 1) >> 1);
+ const auto __half_step = _S_add_to_phase(__old_phase, 1);
+ const auto __full_step = _S_add_to_phase(__old_phase, 2);
+
for (int __round = 0; ; ++__round)
{
if (__current_expected <= 1)
@@ -165,7 +169,6 @@ It looks different from literature pseudocode for two main reasons:
{
__atomic_phase_ref_t __phase(_M_phase);
const auto __old_phase = __phase.load(memory_order_relaxed);
- const auto __cur = static_cast<unsigned char>(__old_phase);
for(; __update; --__update)
{
if(_M_arrive(__old_phase))
@@ -173,7 +176,7 @@ It looks different from literature pseudocode for two main reasons:
_M_completion();
_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
_M_expected_adjustment.store(0, memory_order_relaxed);
- auto __new_phase = static_cast<__barrier_phase_t>(__cur + 2);
+ auto __new_phase = _S_add_to_phase(__old_phase, 2);
__phase.store(__new_phase, memory_order_release);
__phase.notify_all();
}
--
2.30.1
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 5/5] barrier: optimise by not having the hasher in a loop
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
` (2 preceding siblings ...)
2021-02-26 15:59 ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
@ 2021-02-26 15:59 ` Thiago Macieira
2021-03-03 14:36 ` Jonathan Wakely
2021-02-26 18:14 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
2021-03-03 14:34 ` Jonathan Wakely
5 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 15:59 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
Our thread's ID does not change so we don't have to get it every time
and hash it every time.
---
libstdc++-v3/include/std/barrier | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index ae058bd3dc3..eb31a89b175 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -101,12 +101,10 @@ It looks different from literature pseudocode for two main reasons:
}
bool
- _M_arrive(__barrier_phase_t __old_phase)
+ _M_arrive(__barrier_phase_t __old_phase, size_t __current)
{
size_t __current_expected = _M_expected;
- std::hash<std::thread::id> __hasher;
- size_t __current = __hasher(std::this_thread::get_id())
- % ((_M_expected + 1) >> 1);
+ __current %= ((__current_expected + 1) >> 1);
const auto __half_step = _S_add_to_phase(__old_phase, 1);
const auto __full_step = _S_add_to_phase(__old_phase, 2);
@@ -167,11 +165,13 @@ It looks different from literature pseudocode for two main reasons:
[[nodiscard]] arrival_token
arrive(ptrdiff_t __update)
{
+ std::hash<std::thread::id> __hasher;
+ size_t __current = __hasher(std::this_thread::get_id());
__atomic_phase_ref_t __phase(_M_phase);
const auto __old_phase = __phase.load(memory_order_relaxed);
for(; __update; --__update)
{
- if(_M_arrive(__old_phase))
+ if(_M_arrive(__old_phase, __current))
{
_M_completion();
_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
--
2.30.1
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 11:19 ` Jonathan Wakely
@ 2021-02-26 17:37 ` Thiago Macieira
2021-02-26 18:29 ` Jonathan Wakely
2021-02-26 18:47 ` Ville Voutilainen
0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 17:37 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, Thomas Rodgers
On Friday, 26 February 2021 03:19:02 PST Jonathan Wakely wrote:
> > has huge inefficiencies. Since everything in this implementation is
> > inline and, once released, it will tie our hands until the next ABI break
> > (libstdc++.so.
> That's not true. By not committing to exporting these symbols from
> libstdc++.so now, we can move them into the library later once the
> design is stable.
Hello Jonathan
Thank you for replying.
The problem is not the export or not export, it's that the protocol/method is
locked down. We cannot change it in a future version, like changing the number
of __waiter objects in the __waiters::_S_for function. Since that code gets
inlined into every executable that uses it, future versions are bound to use
the exact same selection until we break ABI.
> C++20 support is experimental and not stable. If you compile it with
> GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
> So for GCC 12 (or a later version) we could replace the inline
> functions with non-inline code in libstdc++.so and commit to that
> being stable.
Oh? That's not a really widely understood fact. I see people using C++20 right
now all the time in the cpplang slack and no one seems to be worried that
support is experimental.
Still, what I'm proposing is that we minimise the fallout right now.
> > 4) std::latch and std::counting_semaphore defaults preclude from using
> > futex on Linux
>
> Yes, I've suggested that we should use INT_MAX (or maybe UINT_MAX) as
> the default, so they fit in the futex range.
Hmm... I didn't test increasing to UINT_MAX after my change to
__platform_wait_uses_type. I guess it should just work.
> I'm not working today, so I'll read the rest and respond to the test
> another time.
>
> tl;dr this code is experimental and expected to change, so is inline
> for a reason.
Thanks, but I still propose we hide it better.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
` (3 preceding siblings ...)
2021-02-26 15:59 ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
@ 2021-02-26 18:14 ` Andreas Schwab
2021-02-26 19:08 ` Thiago Macieira
2021-03-03 14:34 ` Jonathan Wakely
5 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-02-26 18:14 UTC (permalink / raw)
To: Thiago Macieira via Gcc-patches; +Cc: libstdc++, Thiago Macieira
On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
> private:
> - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> + alignas(__alignof__(int)) int _M_a;
Futexes must be aligned to 4 bytes.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 17:37 ` Thiago Macieira
@ 2021-02-26 18:29 ` Jonathan Wakely
2021-02-26 19:30 ` Ville Voutilainen
2021-02-26 18:47 ` Ville Voutilainen
1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 18:29 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On 26/02/21 09:37 -0800, Thiago Macieira via Libstdc++ wrote:
>On Friday, 26 February 2021 03:19:02 PST Jonathan Wakely wrote:
>> > has huge inefficiencies. Since everything in this implementation is
>> > inline and, once released, it will tie our hands until the next ABI break
>> > (libstdc++.so.
>> That's not true. By not committing to exporting these symbols from
>> libstdc++.so now, we can move them into the library later once the
>> design is stable.
>
>Hello Jonathan
>
>Thank you for replying.
>
>The problem is not the export or not export, it's that the protocol/method is
>locked down. We cannot change it in a future version, like changing the number
>of __waiter objects in the __waiters::_S_for function. Since that code gets
>inlined into every executable that uses it, future versions are bound to use
>the exact same selection until we break ABI.
Compiling C++20 code with GCC 11 and linking to C++20 code compiled
with GCC != 11 is unsupported. The ABI is expected to break for C++20
features.
We can change it. We WILL change it.
The world is not ending.
>> C++20 support is experimental and not stable. If you compile it with
>> GCC 11 you can't expect to link it to C++20 code compiled with GCC 12.
>> So for GCC 12 (or a later version) we could replace the inline
>> functions with non-inline code in libstdc++.so and commit to that
>> being stable.
>
>Oh? That's not a really widely understood fact. I see people using C++20 right
>now all the time in the cpplang slack and no one seems to be worried that
>support is experimental.
I can't really help it if people don't read documentation for
important tools they rely on.
https://gcc.gnu.org/projects/cxx-status.html#cxx20
https://gcc.gnu.org/gcc-8/changes.html#libstdcxx
https://gcc.gnu.org/gcc-9/changes.html#libstdcxx
https://gcc.gnu.org/gcc-10/changes.html#libstdcxx
https://gcc.gnu.org/gcc-11/changes.html#libstdcxx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 17:37 ` Thiago Macieira
2021-02-26 18:29 ` Jonathan Wakely
@ 2021-02-26 18:47 ` Ville Voutilainen
2021-02-26 23:53 ` Thiago Macieira
1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 18:47 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> Oh? That's not a really widely understood fact. I see people using C++20 right
> now all the time in the cpplang slack and no one seems to be worried that
> support is experimental.
People writing toys on forums is not real production use that we
should worry about, and anyone who uses
a two-months-old standard in production use is going to survive
possible breaking changes. Or if they won't,
that's their stupid fault.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-26 18:14 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
@ 2021-02-26 19:08 ` Thiago Macieira
2021-02-26 19:31 ` Andreas Schwab
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 19:08 UTC (permalink / raw)
To: Thiago Macieira via Gcc-patches, Andreas Schwab; +Cc: libstdc++
On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > }
> >
> > private:
> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > + alignas(__alignof__(int)) int _M_a;
>
> Futexes must be aligned to 4 bytes.
Agreed, but doesn't this accomplish that?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 18:29 ` Jonathan Wakely
@ 2021-02-26 19:30 ` Ville Voutilainen
2021-02-26 21:17 ` Jonathan Wakely
0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 19:30 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Thiago Macieira, Thomas Rodgers, libstdc++
On Fri, 26 Feb 2021 at 21:26, Jonathan Wakely via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> >Oh? That's not a really widely understood fact. I see people using C++20 right
> >now all the time in the cpplang slack and no one seems to be worried that
> >support is experimental.
>
> I can't really help it if people don't read documentation for
> important tools they rely on.
>
> https://gcc.gnu.org/projects/cxx-status.html#cxx20
Also, if it's not a sufficient hint that you need to explicitly tell
the compiler to enter C++20 mode, there's
nothing worth our time to protect such users from the perils of the
world. And trying to remain
ABI-compatible between experimental C++20 support and its subsequent
C++20 support absolutely isn't one
of the things worth our time, even for users that are not idiots. It's
experimental, and We Mean It.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-26 19:08 ` Thiago Macieira
@ 2021-02-26 19:31 ` Andreas Schwab
2021-02-27 0:13 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-02-26 19:31 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Thiago Macieira via Gcc-patches, libstdc++
On Feb 26 2021, Thiago Macieira wrote:
> On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>> > @@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > }
>> >
>> > private:
>> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>> > + alignas(__alignof__(int)) int _M_a;
>>
>> Futexes must be aligned to 4 bytes.
>
> Agreed, but doesn't this accomplish that?
No. It uses whatever alignment the type already has, and is an
elaborate no-op.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 19:30 ` Ville Voutilainen
@ 2021-02-26 21:17 ` Jonathan Wakely
2021-02-26 21:18 ` Ville Voutilainen
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 21:17 UTC (permalink / raw)
To: Ville Voutilainen
Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++
On Fri, 26 Feb 2021, 20:22 Ville Voutilainen via Libstdc++, <
libstdc++@gcc.gnu.org> wrote:
> On Fri, 26 Feb 2021 at 21:26, Jonathan Wakely via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
> > >Oh? That's not a really widely understood fact. I see people using
> C++20 right
> > >now all the time in the cpplang slack and no one seems to be worried
> that
> > >support is experimental.
> >
> > I can't really help it if people don't read documentation for
> > important tools they rely on.
> >
> > https://gcc.gnu.org/projects/cxx-status.html#cxx20
>
> Also, if it's not a sufficient hint that you need to explicitly tell
> the compiler to enter C++20 mode, there's
> nothing worth our time to protect such users from the perils of the
> world. And trying to remain
> ABI-compatible between experimental C++20 support and its subsequent
> C++20 support absolutely isn't one
> of the things worth our time, even for users that are not idiots. It's
> experimental, and We Mean It.
>
I've considered whether we should put things in an inline namespace until
they're stable, so that when we move it from std::__unstable to std::
there's an ABI break even if nothing else changes. Just to make it clear.
Maybe even a different namespace for each release. std::__gcc11, then
change it to __gcc12 just to make it incompatible.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 21:17 ` Jonathan Wakely
@ 2021-02-26 21:18 ` Ville Voutilainen
2021-02-26 21:39 ` Jonathan Wakely
0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 21:18 UTC (permalink / raw)
To: Jonathan Wakely
Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++
On Fri, 26 Feb 2021 at 23:17, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> I've considered whether we should put things in an inline namespace until they're stable, so that when we move it from std::__unstable to std:: there's an ABI break even if nothing else changes. Just to make it clear.
>
> Maybe even a different namespace for each release. std::__gcc11, then change it to __gcc12 just to make it incompatible.
Sounds like overkill to me, just an additional burden to
remove/stabilize that inline namespacing in the happy event that we
don't
actually need to break anything.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 21:18 ` Ville Voutilainen
@ 2021-02-26 21:39 ` Jonathan Wakely
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-02-26 21:39 UTC (permalink / raw)
To: Ville Voutilainen
Cc: Jonathan Wakely, Thomas Rodgers, Thiago Macieira, libstdc++
On Fri, 26 Feb 2021, 21:19 Ville Voutilainen, <ville.voutilainen@gmail.com>
wrote:
> On Fri, 26 Feb 2021 at 23:17, Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
> > I've considered whether we should put things in an inline namespace
> until they're stable, so that when we move it from std::__unstable to std::
> there's an ABI break even if nothing else changes. Just to make it clear.
> >
> > Maybe even a different namespace for each release. std::__gcc11, then
> change it to __gcc12 just to make it incompatible.
>
> Sounds like overkill to me, just an additional burden to
> remove/stabilize that inline namespacing in the happy event that we
> don't
> actually need to break anything.
>
Exactly why I've gone no further than considering it.
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 18:47 ` Ville Voutilainen
@ 2021-02-26 23:53 ` Thiago Macieira
2021-02-26 23:58 ` Ville Voutilainen
2021-03-03 14:24 ` Jonathan Wakely
0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-26 23:53 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Friday, 26 February 2021 10:47:09 PST Ville Voutilainen wrote:
> On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
>
> <libstdc++@gcc.gnu.org> wrote:
> > Oh? That's not a really widely understood fact. I see people using C++20
> > right now all the time in the cpplang slack and no one seems to be
> > worried that support is experimental.
>
> People writing toys on forums is not real production use that we
> should worry about, and anyone who uses
> a two-months-old standard in production use is going to survive
> possible breaking changes. Or if they won't,
> that's their stupid fault.
Think about this scenario. Let's say GCC 12 declares to C++20 support to be
non-experimental.
Then someone writes a production tool that does:
#if __cpp_lib_atomic_wait >= 201907L
a.wait(n);
#else
fallback(a);
#endif
Should they also prevent people from compiling their code with GCC 11, which
will define the macro and does support "-std=c++20"?
Is there a way to programmatically determine what level of the C++ Standard
Library is officially supported so the requirement can be turned into a test?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 23:53 ` Thiago Macieira
@ 2021-02-26 23:58 ` Ville Voutilainen
2021-02-27 0:11 ` Thiago Macieira
2021-03-03 14:24 ` Jonathan Wakely
1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-26 23:58 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Sat, 27 Feb 2021 at 01:53, Thiago Macieira <thiago.macieira@intel.com> wrote:
> Think about this scenario. Let's say GCC 12 declares to C++20 support to be
> non-experimental.
>
> Then someone writes a production tool that does:
>
> #if __cpp_lib_atomic_wait >= 201907L
> a.wait(n);
> #else
> fallback(a);
> #endif
>
> Should they also prevent people from compiling their code with GCC 11, which
> will define the macro and does support "-std=c++20"?
Yes, if there's an implementation bug in GCC 11 that renders the GCC
12 implementation incompatible.
This is not new.
> Is there a way to programmatically determine what level of the C++ Standard
> Library is officially supported so the requirement can be turned into a test?
No general one. Here we are talking about implementation bugs;
feature-testing macros are not
trying to cover those. And in the sunshine cases where we don't have
bugs or binary breaks,
defining the macros early is fine, so we're not going to postpone
defining them, in the hypothetical
event that someone might suggest that.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 23:58 ` Ville Voutilainen
@ 2021-02-27 0:11 ` Thiago Macieira
2021-02-27 0:18 ` Ville Voutilainen
2021-02-27 0:22 ` Marc Glisse
0 siblings, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27 0:11 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Friday, 26 February 2021 15:58:01 PST Ville Voutilainen wrote:
> > Should they also prevent people from compiling their code with GCC 11,
> > which will define the macro and does support "-std=c++20"?
>
> Yes, if there's an implementation bug in GCC 11 that renders the GCC
> 12 implementation incompatible.
> This is not new.
One would expect that implementation bugs -- not design flaws -- do not affect
the entire functionality. And in most inline code, that is not a big deal
since the operation was still achieved one way or the other.
What we're talking about here makes the entire set of classes unsuitable for
consumption and dangerous, with no way to programmatically determine when
they've become suitable. My worry is not that someone will rush to use them in
2021. My worry is that well-tested code in 2024-2025 breaks because people are
using a mix of systems that includes Ubuntu 22.04.
> No general one. Here we are talking about implementation bugs;
> feature-testing macros are not
> trying to cover those. And in the sunshine cases where we don't have
> bugs or binary breaks,
> defining the macros early is fine, so we're not going to postpone
> defining them, in the hypothetical
> event that someone might suggest that.
Fair enough, but we know we are not in the sunshine case. We know now, before
the release, that it WILL change and cause silent incompatibilities. Does that
change the conclusion?
How about the other way around: only enable the new, experimental code if the
user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-26 19:31 ` Andreas Schwab
@ 2021-02-27 0:13 ` Thiago Macieira
2021-02-28 21:31 ` Hans-Peter Nilsson
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27 0:13 UTC (permalink / raw)
To: Andreas Schwab; +Cc: gcc-patches, libstdc++
On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> On Feb 26 2021, Thiago Macieira wrote:
> > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> >> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> >> > + alignas(__alignof__(int)) int _M_a;
> >>
> >> Futexes must be aligned to 4 bytes.
> >
> > Agreed, but doesn't this accomplish that?
>
> No. It uses whatever alignment the type already has, and is an
> elaborate no-op.
I thought so too when I read the original line. But I expected it was written
like that for a reason, especially since the same pattern appears in other
places.
I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
the correct solution?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:11 ` Thiago Macieira
@ 2021-02-27 0:18 ` Ville Voutilainen
2021-02-27 0:36 ` Thiago Macieira
2021-02-27 0:22 ` Marc Glisse
1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27 0:18 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Sat, 27 Feb 2021 at 02:11, Thiago Macieira <thiago.macieira@intel.com> wrote:
>
> On Friday, 26 February 2021 15:58:01 PST Ville Voutilainen wrote:
> > > Should they also prevent people from compiling their code with GCC 11,
> > > which will define the macro and does support "-std=c++20"?
> >
> > Yes, if there's an implementation bug in GCC 11 that renders the GCC
> > 12 implementation incompatible.
> > This is not new.
>
> One would expect that implementation bugs -- not design flaws -- do not affect
> the entire functionality. And in most inline code, that is not a big deal
> since the operation was still achieved one way or the other.
>
> What we're talking about here makes the entire set of classes unsuitable for
> consumption and dangerous, with no way to programmatically determine when
> they've become suitable. My worry is not that someone will rush to use them in
> 2021. My worry is that well-tested code in 2024-2025 breaks because people are
> using a mix of systems that includes Ubuntu 22.04.
There's nothing new in that either. I do, however, wonder how
dangerous using the current
implementation actually is? Anyhoo, almost regardless of that, we have
had situations like
that before. While we don't exactly want to repeat them (std::regex
in.. gcc8? comes to mind here),
I'm not entirely convinced that we are talking about a showstopper problem here.
> > No general one. Here we are talking about implementation bugs;
> > feature-testing macros are not
> > trying to cover those. And in the sunshine cases where we don't have
> > bugs or binary breaks,
> > defining the macros early is fine, so we're not going to postpone
> > defining them, in the hypothetical
> > event that someone might suggest that.
>
> Fair enough, but we know we are not in the sunshine case. We know now, before
> the release, that it WILL change and cause silent incompatibilities. Does that
> change the conclusion?
Do tell me what those silent incompatibilities really are. If they're
differences in inlined
code baked into client applications, I'm not sure I see what's so
earth-shattering there.
> How about the other way around: only enable the new, experimental code if the
> user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
That seems drastic. We wouldn't get pretty much any use coverage of
any new code,
and that would just sweep the rug from under the whole idea of
feature-testing macros,
half of which is allowing bold users to adopt new facilities early.
Now they would need
to fall back to seventeen different implementation-specific ways to
opt in to that adoption,
which takes us back a decade. Please explain how your concerns about
atomic waits
are *anywhere near* the cost of doing that?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:11 ` Thiago Macieira
2021-02-27 0:18 ` Ville Voutilainen
@ 2021-02-27 0:22 ` Marc Glisse
2021-02-27 0:30 ` Ville Voutilainen
2021-02-27 0:43 ` Thiago Macieira
1 sibling, 2 replies; 73+ messages in thread
From: Marc Glisse @ 2021-02-27 0:22 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++
On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:
> How about the other way around: only enable the new, experimental code if the
> user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
That's exactly what the user did, they passed -std=c++20...
Ok, it might help if the option was called -std=c++2a until it is stable,
or if we defined a macro that advertises if we are in an experimental mode
or not.
But ultimately, it is the user's responsibility to use working (not
experimental) tools. And you can document that your library is only
compatible with gcc-12+.
--
Marc Glisse
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:22 ` Marc Glisse
@ 2021-02-27 0:30 ` Ville Voutilainen
2021-02-27 0:43 ` Thiago Macieira
1 sibling, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27 0:30 UTC (permalink / raw)
To: libstdc++; +Cc: Thiago Macieira, Thomas Rodgers
On Sat, 27 Feb 2021 at 02:22, Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:
>
> > How about the other way around: only enable the new, experimental code if the
> > user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
>
> That's exactly what the user did, they passed -std=c++20...
>
> Ok, it might help if the option was called -std=c++2a until it is stable,
> or if we defined a macro that advertises if we are in an experimental mode
> or not.
>
> But ultimately, it is the user's responsibility to use working (not
> experimental) tools. And you can document that your library is only
> compatible with gcc-12+.
To be fair, I partially understand where Thiago may be coming from.
Doing that is *horrible* for things like.. say.. Qt.
While it's not nice to opt in to new features for seventeen different
implementations, it's not much fun to disable
features due to bugs in seventeen different implementations either.
(Yeah, seventeen is a bit of an exaggeration.)
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:18 ` Ville Voutilainen
@ 2021-02-27 0:36 ` Thiago Macieira
2021-02-27 0:44 ` Ville Voutilainen
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27 0:36 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Friday, 26 February 2021 16:18:33 PST Ville Voutilainen wrote:
> There's nothing new in that either. I do, however, wonder how
> dangerous using the current
> implementation actually is? Anyhoo, almost regardless of that, we have
> had situations like
> that before. While we don't exactly want to repeat them (std::regex
> in.. gcc8? comes to mind here),
> I'm not entirely convinced that we are talking about a showstopper problem
> here.
The danger is not how they're implemented today. The danger is the change.
That would imply a pre-change wait() is never woken up by a post-change
notify() or vice-versa. That in turn causes deadlocks in latches, barriers and
semaphores.
Depending on how the two implementations are achieved, the outcome may also
depend on the order of waits and notifies and which side of the change is
doing which operation. That in turn leads to heisenbugs in production which a
developer will almost never see.
> > Fair enough, but we know we are not in the sunshine case. We know now,
> > before the release, that it WILL change and cause silent
> > incompatibilities. Does that change the conclusion?
>
> Do tell me what those silent incompatibilities really are. If they're
> differences in inlined
> code baked into client applications, I'm not sure I see what's so
> earth-shattering there.
1) size of structures change
2) default template parameters change (silent ODR violations)
3) data protocol changes (whether certain static members are used or not)
> > How about the other way around: only enable the new, experimental code if
> > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
>
> That seems drastic. We wouldn't get pretty much any use coverage of
> any new code,
> and that would just sweep the rug from under the whole idea of
> feature-testing macros,
> half of which is allowing bold users to adopt new facilities early.
Those goals seem incompatible. If the feature is experimental and not to be
used in production, then we don't want users to adopt the facilities early. We
want them to wait until we declare that it's no longer experimental.
> Now they would need
> to fall back to seventeen different implementation-specific ways to
> opt in to that adoption,
> which takes us back a decade.
I'm the last person to want those multiple ways of detecting when a feature is
safe to be used. You know that. We specifically made it so Qt would NOT use
any new C++14 or C++17 feature from the compiler or the Standard Library until
the macros were present, thus helping force a certain compiler vendor to adopt
the feature macros.
But this is exactly the issue: I *don't* want to have to know that
__cpp_lib_atomic_wait == 201907L is only acceptable if __GNUC_MAJOR__ > 11.
> Please explain how your concerns about atomic waits
> are *anywhere near* the cost of doing that?
Which part? The part about why I think the implementation should be different?
Or the part in which making a change causes future problems?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:22 ` Marc Glisse
2021-02-27 0:30 ` Ville Voutilainen
@ 2021-02-27 0:43 ` Thiago Macieira
1 sibling, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27 0:43 UTC (permalink / raw)
To: libstdc++; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++
On Friday, 26 February 2021 16:22:34 PST Marc Glisse wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Libstdc++ wrote:
> > How about the other way around: only enable the new, experimental code if
> > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
>
> That's exactly what the user did, they passed -std=c++20...
>
> Ok, it might help if the option was called -std=c++2a until it is stable,
> or if we defined a macro that advertises if we are in an experimental mode
> or not.
I wouldn't mind that. If GCC documents that it's "c++20" or "c++23" when it's
stable and no incompatibilities are expected (not counting exceptional
circumstances), I would gladly accept it. Do note that it would go against
Ville's email that you replied to that said we want to get early exposure and
coverage. It means a single, obscure Standard Library feature could hold back
the entire flag for the Core Language or vice-versa.
I don't think that is the best outcome, though. I would prefer it more fine-
grained. And yeah, in the sunshine case where everything works, it might come
down to "oops, we made a mistake" (like <filesystem> file time clock).
This email thread is to advise we will have such a case. Are there steps we
can take to avoid it?
> But ultimately, it is the user's responsibility to use working (not
> experimental) tools. And you can document that your library is only
> compatible with gcc-12+.
Give me a programmatic way and I'll be happy.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:36 ` Thiago Macieira
@ 2021-02-27 0:44 ` Ville Voutilainen
2021-02-27 0:53 ` Thiago Macieira
2021-03-03 14:30 ` Jonathan Wakely
0 siblings, 2 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27 0:44 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com> wrote:
>
> On Friday, 26 February 2021 16:18:33 PST Ville Voutilainen wrote:
> > There's nothing new in that either. I do, however, wonder how
> > dangerous using the current
> > implementation actually is? Anyhoo, almost regardless of that, we have
> > had situations like
> > that before. While we don't exactly want to repeat them (std::regex
> > in.. gcc8? comes to mind here),
> > I'm not entirely convinced that we are talking about a showstopper problem
> > here.
>
> The danger is not how they're implemented today. The danger is the change.
> That would imply a pre-change wait() is never woken up by a post-change
> notify() or vice-versa. That in turn causes deadlocks in latches, barriers and
> semaphores.
Then we must be talking about something else than the topic of this
thread, which says "inefficient". :P
I need to re-read your patches, but not until tomorrow.
> > Do tell me what those silent incompatibilities really are. If they're
> > differences in inlined
> > code baked into client applications, I'm not sure I see what's so
> > earth-shattering there.
>
> 1) size of structures change
> 2) default template parameters change (silent ODR violations)
> 3) data protocol changes (whether certain static members are used or not)
In other words, ABI breaks, sure.
> > > How about the other way around: only enable the new, experimental code if
> > > the user defines _GLIBCXX_I_WANT_EXPERIMENTAL before the #includes?
> >
> > That seems drastic. We wouldn't get pretty much any use coverage of
> > any new code,
> > and that would just sweep the rug from under the whole idea of
> > feature-testing macros,
> > half of which is allowing bold users to adopt new facilities early.
>
> Those goals seem incompatible. If the feature is experimental and not to be
> used in production, then we don't want users to adopt the facilities early. We
> want them to wait until we declare that it's no longer experimental.
Well, we want *bold* users to adopt new facilities early. Not all users.
Feature-testing macros don't cover the problem of innocent users accidentally
opting in to something that's in flux; the general expectation is that innocent
users shouldn't do that.
> But this is exactly the issue: I *don't* want to have to know that
> __cpp_lib_atomic_wait == 201907L is only acceptable if __GNUC_MAJOR__ > 11.
Aye. I get that perfectly well.
> > Please explain how your concerns about atomic waits
> > are *anywhere near* the cost of doing that?
> Which part? The part about why I think the implementation should be different?
> Or the part in which making a change causes future problems?
The part where there is an ABI break that's insurmountable. But as I
said, I need to read
your patches with fresh eyes, not with past-midnight eyes.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:44 ` Ville Voutilainen
@ 2021-02-27 0:53 ` Thiago Macieira
2021-02-27 1:03 ` Ville Voutilainen
2021-03-03 14:30 ` Jonathan Wakely
1 sibling, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-02-27 0:53 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Friday, 26 February 2021 16:44:44 PST Ville Voutilainen wrote:
> On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com>
wrote:
> > The danger is not how they're implemented today. The danger is the change.
> > That would imply a pre-change wait() is never woken up by a post-change
> > notify() or vice-versa. That in turn causes deadlocks in latches, barriers
> > and semaphores.
>
> Then we must be talking about something else than the topic of this
> thread, which says "inefficient". :P
Well, if they were efficient in the first place, we wouldn't need a change. :)
The problem is that they ARE inefficient and they're implemented in such a way
that a change to make them efficient breaks compatibility with the pre-change
implementation.
> I need to re-read your patches, but not until tomorrow.
Fair enough. I'm disconnecting from the work VPN right now and won't be
getting replies. Let me know if I can explain anything, though.
> Well, we want *bold* users to adopt new facilities early. Not all users.
> Feature-testing macros don't cover the problem of innocent users
> accidentally opting in to something that's in flux; the general expectation
> is that innocent users shouldn't do that.
I totally agree.
But how do we allow bold users to adopt the facilities in such a way not to
endanger the other (italic?) users?
> > > Please explain how your concerns about atomic waits
> > > are *anywhere near* the cost of doing that?
> >
> > Which part? The part about why I think the implementation should be
> > different? Or the part in which making a change causes future problems?
>
> The part where there is an ABI break that's insurmountable. But as I
> said, I need to read
> your patches with fresh eyes, not with past-midnight eyes.
Hyvää viikonloppua, Ville.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:53 ` Thiago Macieira
@ 2021-02-27 1:03 ` Ville Voutilainen
0 siblings, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27 1:03 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Jonathan Wakely, Thomas Rodgers, libstdc++
On Sat, 27 Feb 2021 at 02:53, Thiago Macieira <thiago.macieira@intel.com> wrote:
> Hyvää viikonloppua, Ville.
Tenha um bom fim de semana, meu amigo. :)
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
2021-02-26 11:19 ` Jonathan Wakely
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
@ 2021-02-27 1:13 ` Thomas Rodgers
2021-02-27 1:29 ` Thomas Rodgers
2021-02-27 2:02 ` Ville Voutilainen
2 siblings, 2 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27 1:13 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, trodgers, jwakely
We are SPECIFICALLY NOT committing to an ABI for wait/notify with GCC11.
These features are experimental and I have no intention of treating this
as a blocker.
On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
> Hello all
>
> I was investigating the implementation of std::atomic<T>::wait to use
> inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline
> and,
> once released, it will tie our hands until the next ABI break
> (libstdc++.so.
> 7). And no, inline namespaces and ABI tags are no different than ABI
> breaks,
> they only make the ABI break more or less silent in the process.
>
> Here's a summary of the findings:
>
> 1) everything is inline
> 2) futex code is still behind a lot of code calling into
> std::_Hash_bytes
> 3) other int-sized (incl. enums) atomics don't use futex
> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't
> handle
>
> Here's what the code currently looks like:
> https://gcc.godbolt.org/z/MeP8rr
>
> Given how far we are in the release cycle, a simple solution is to
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
>
> I have patches for all but the first issue, but they are a result of
> only 3
> hours of hacking to prove the concept. I will post them, but I do not
> expect
> them to get merged as-is.
>
> Details:
>
> 1) everything is inline
>
> I think this is unnecessary optimisation, especially since it expands
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL
> shows that
> the actual wait and notification are out-of-line. I recommend we do the
> same
> for libstdc++: if we're going to syscall anyway, the extra function
> call isn't
> going to be the performance bottleneck.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>,
> Jonathan
> wrote:
>
>> Eventually we'll want to move the rest of this function (which doesn't
>> depend on the template argument) into the compiled library, but it's
>> better to be header-only for now.
>
> But that has never happened.
>
> Having out-of-line implementations would allow the implementation for
> generic-
> sized atomic wait/notify to fine-tune as time and experience
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear
> if it is
> the best for all workloads. Moreover, the implementation is limited to
> 16
> waiters.
>
> Moreover, it would allow us to have support in the future for a 64-bit
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have
> support
> for 64-bit compare-and-wait functionality.
>
> And it would allow us to have futex-equivalent support in other OSes,
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current
> state.
> It's also possible other OSes will eventually add equivalent
> functionality
> because of the C++ standard. For example, FreeBSD already has futex()
> support
> inside its kernel in order to run Linux binaries, but it is not exposed
> to the
> FreeBSD-personality syscall ABI.
>
> Inlining certain functionality at a later date is possible. De-inlining
> isn't.
>
> 2) futex code for int is still behind a lot of boiler plate
>
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect
> whether
> there is contention in the first place and avoid the system call, but I
> think
> that's premature optimisation.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>,
> Thomas
> says:
>
>> __atomic_notify checks to see if that count is non-zero
>> before invoking the platform/semaphore notify because it is cheaper
>> to do the atomic load than it is to make the syscall() when there are
>> no
>> waiters.
>
> That's true, but it ignores the fact that users of
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
>
> [1] https://github.com/ogiroux/atomic_wait
>
> 3) other int-sized atomics don't use futex
>
> The kernel doesn't care what we store in those 32 bits, only that they
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations
> that
> are signed, but we don't use that and the kernel API can be later
> extended to
> unsigned compares). Therefore, I would recommend we extend futex
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
>
> 4) std::latch and std::counting_semaphore defaults too high
>
> The standard requires a ptrdiff_t in the API, but it does allow us to
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's
> default
> limit uses PTRDIFF_MAX:
>
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
>
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
>
> 5) std::barrier uses a type that futex can't handle
>
> The barrier phase is the actual type that gets waited / notified on,
> but it is
> an enum class on unsigned char:
>
> enum class __barrier_phase_t : unsigned char { };
>
> This is a harder problem to solve, but I really do recommend making it
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full
> 32
> bits, I don't know (I've implemented with wrapping on 256).
>
> What is of consequence is that the __state_t type could increase in
> size
> ("could" because we can simply reduce the number of tickets in each
> state
> entry). I haven't done enough research on this to know whether it is
> the right
> way to go.
Links:
------
[1] https://github.com/ogiroux/atomic_wait
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
@ 2021-02-27 1:29 ` Thomas Rodgers
2021-02-27 3:01 ` Thomas Rodgers
2021-02-27 2:02 ` Ville Voutilainen
1 sibling, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27 1:29 UTC (permalink / raw)
To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++
On 2021-02-26 17:13, Thomas Rodgers wrote:
> We are SPECIFICALLY NOT committing to an ABI for wait/notify with
> GCC11. These features are experimental and I have no intention of
> treating this as a blocker.
>
> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
>
>> Hello all
>>
>> I was investigating the implementation of std::atomic<T>::wait to use
>> inside
>> my own code (Qt's QMutex and QSemaphore), when I realised that it has
>> huge inefficiencies. Since everything in this implementation is inline
>> and,
>> once released, it will tie our hands until the next ABI break
>> (libstdc++.so.
>> 7). And no, inline namespaces and ABI tags are no different than ABI
>> breaks,
>> they only make the ABI break more or less silent in the process.
>>
>> Here's a summary of the findings:
>>
>> 1) everything is inline
>> 2) futex code is still behind a lot of code calling into
>> std::_Hash_bytes
It's likely possible to remove the dependence on _Hash_bytes. I looked
at it while working on the most recent version of this code but decided
to not touch it for GCC11 because...it's experimental and we
aren't committing to an ABI.
> 3) other int-sized (incl. enums) atomics don't use futex
The most recent version of this code extends the range of types which
use futex to all 32bit types.
> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't
> handle
>
> Here's what the code currently looks like:
> https://gcc.godbolt.org/z/MeP8rr
>
> Given how far we are in the release cycle, a simple solution is to
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
>
> I have patches for all but the first issue, but they are a result of
> only 3
> hours of hacking to prove the concept. I will post them, but I do not
> expect
> them to get merged as-is.
>
> Details:
>
> 1) everything is inline
>
> I think this is unnecessary optimisation, especially since it expands
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL
> shows that
> the actual wait and notification are out-of-line. I recommend we do the
> same
> for libstdc++: if we're going to syscall anyway, the extra function
> call isn't
> going to be the performance bottleneck.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>,
> Jonathan
> wrote:
>
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.
> But that has never happened.
>
> Having out-of-line implementations would allow the implementation for
> generic-
> sized atomic wait/notify to fine-tune as time and experience
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear
> if it is
> the best for all workloads. Moreover, the implementation is limited to
> 16
> waiters.
>
> Moreover, it would allow us to have support in the future for a 64-bit
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have
> support
> for 64-bit compare-and-wait functionality.
>
> And it would allow us to have futex-equivalent support in other OSes,
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current
> state.
> It's also possible other OSes will eventually add equivalent
> functionality
> because of the C++ standard. For example, FreeBSD already has futex()
> support
> inside its kernel in order to run Linux binaries, but it is not exposed
> to the
> FreeBSD-personality syscall ABI.
>
> Inlining certain functionality at a later date is possible. De-inlining
> isn't.
>
> 2) futex code for int is still behind a lot of boiler plate
>
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect
> whether
> there is contention in the first place and avoid the system call, but I
> think
> that's premature optimisation.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>,
> Thomas
> says:
>
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are
> no
> waiters.
> That's true, but it ignores the fact that users of
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
>
> [1] https://github.com/ogiroux/atomic_wait
>
> 3) other int-sized atomics don't use futex
>
> The kernel doesn't care what we store in those 32 bits, only that they
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations
> that
> are signed, but we don't use that and the kernel API can be later
> extended to
> unsigned compares). Therefore, I would recommend we extend futex
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
>
> 4) std::latch and std::counting_semaphore defaults too high
>
> The standard requires a ptrdiff_t in the API, but it does allow us to
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's
> default
> limit uses PTRDIFF_MAX:
>
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
>
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
>
> 5) std::barrier uses a type that futex can't handle
>
> The barrier phase is the actual type that gets waited / notified on,
> but it is
> an enum class on unsigned char:
>
> enum class __barrier_phase_t : unsigned char { };
>
> This is a harder problem to solve, but I really do recommend making it
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full
> 32
> bits, I don't know (I've implemented with wrapping on 256).
>
> What is of consequence is that the __state_t type could increase in
> size
> ("could" because we can simply reduce the number of tickets in each
> state
> entry). I haven't done enough research on this to know whether it is
> the right
> way to go.
Links:
------
[1] https://github.com/ogiroux/atomic_wait
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
2021-02-27 1:29 ` Thomas Rodgers
@ 2021-02-27 2:02 ` Ville Voutilainen
1 sibling, 0 replies; 73+ messages in thread
From: Ville Voutilainen @ 2021-02-27 2:02 UTC (permalink / raw)
To: Thomas Rodgers
Cc: Thiago Macieira, Thomas Rodgers, Jonathan Wakely, libstdc++
On Sat, 27 Feb 2021 at 03:53, Thomas Rodgers <rodgert@appliantology.com> wrote:
>
> We are SPECIFICALLY NOT committing to an ABI for wait/notify with GCC11.
> These features are experimental and I have no intention of treating this
> as a blocker.
Oh, treating it as a blocker would be madness. But Thiago has a point
- if you have GCC11 code
and GCC13 code, with the same feature-macros and everything compiles
the same, and there's
an ABI break between those, that's not entirely a non-issue. I'm *NOT*
suggesting that that's
a new problem; we did that with std::variant, although I don't exactly
recall whether it was actually
released before I changed the size of its index type depending on the
number of alternatives it can
have. I'm merely saying that it's not a non-issue. I'm yet to
determine how significant this is.
Let's just calmly look at it and ponder.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 1:29 ` Thomas Rodgers
@ 2021-02-27 3:01 ` Thomas Rodgers
2021-03-01 17:46 ` Thomas Rodgers
0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-02-27 3:01 UTC (permalink / raw)
To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++
On 2021-02-26 17:29, Thomas Rodgers wrote:
> On 2021-02-26 17:13, Thomas Rodgers wrote:
>
> We are SPECIFICALLY NOT committing to an ABI for wait/notify with
> GCC11. These features are experimental and I have no intention of
> treating this as a blocker.
>
> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
>
> Hello all
>
> I was investigating the implementation of std::atomic<T>::wait to use
> inside
> my own code (Qt's QMutex and QSemaphore), when I realised that it has
> huge inefficiencies. Since everything in this implementation is inline
> and,
> once released, it will tie our hands until the next ABI break
> (libstdc++.so.
> 7). And no, inline namespaces and ABI tags are no different than ABI
> breaks,
> they only make the ABI break more or less silent in the process.
>
> Here's a summary of the findings:
>
> 1) everything is inline
> 2) futex code is still behind a lot of code calling into
> std::_Hash_bytes
It's likely possible to remove the dependence on _Hash_bytes. I looked
at it while working on the most recent version of this code but decided
to not touch it for GCC11 because...it's experimental and we
aren't committing to an ABI.
> 3) other int-sized (incl. enums) atomics don't use futex
The most recent version of this code extends the range of types which
use futex to all 32bit types.
> 4) std::latch and std::counting_semaphore defaults preclude from using
> futex on Linux
> 5) std::barrier implementation also uses a type that futex(2) can't
> handle
>
> Here's what the code currently looks like:
> https://gcc.godbolt.org/z/MeP8rr
>
> Given how far we are in the release cycle, a simple solution is to
> simply
> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
> __cpp_lib_atomic_wait isn't defined, the rest of the functionality
> cascades
> disabled too. Then we can spend however long is necessary to get a good
> implementation for GCC 12.
>
> I have patches for all but the first issue, but they are a result of
> only 3
> hours of hacking to prove the concept. I will post them, but I do not
> expect
> them to get merged as-is.
>
> Details:
>
> 1) everything is inline
>
> I think this is unnecessary optimisation, especially since it expands
> to a LOT
> of code (see link above). A cursory research into libc++ and MS STL
> shows that
> the actual wait and notification are out-of-line. I recommend we do the
> same
> for libstdc++: if we're going to syscall anyway, the extra function
> call isn't
> going to be the performance bottleneck.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>,
> Jonathan
> wrote:
>
> Eventually we'll want to move the rest of this function (which doesn't
> depend on the template argument) into the compiled library, but it's
> better to be header-only for now.
> But that has never happened.
>
> Having out-of-line implementations would allow the implementation for
> generic-
> sized atomic wait/notify to fine-tune as time and experience
> progresses. The
> std::__detail::__waiter implementation looks clever, but it's unclear
> if it is
> the best for all workloads. Moreover, the implementation is limited to
> 16
> waiters.
>
> Moreover, it would allow us to have support in the future for a 64-bit
> futex.
> Linus did shoot down any need for it 13 years ago, but the world has
> changed
> since and it's not inconceivable to have a pointer-sized futex on every
> platform, which on 64-bit would be 64-bit. Windows and Darwin do have
> support
> for 64-bit compare-and-wait functionality.
>
> And it would allow us to have futex-equivalent support in other OSes,
> added at
> our leisure. The current implementation does not support Windows's
> WaitOnAddress and could never support it if released in the current
> state.
> It's also possible other OSes will eventually add equivalent
> functionality
> because of the C++ standard. For example, FreeBSD already has futex()
> support
> inside its kernel in order to run Linux binaries, but it is not exposed
> to the
> FreeBSD-personality syscall ABI.
>
> Inlining certain functionality at a later date is possible. De-inlining
> isn't.
>
> 2) futex code for int is still behind a lot of boiler plate
>
> It comes from the creation/use of std::__detail::__waiter in
> std::__atomic_wait and __atomic_notify. That does allow one to detect
> whether
> there is contention in the first place and avoid the system call, but I
> think
> that's premature optimisation.
>
> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>,
> Thomas
> says:
>
> __atomic_notify checks to see if that count is non-zero
> before invoking the platform/semaphore notify because it is cheaper
> to do the atomic load than it is to make the syscall() when there are
> no
> waiters.
> That's true, but it ignores the fact that users of
> std::atomic<T>::wait() are
> likely to track contention by themselves on the value of the atomic (I
> know I
> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation
> operate on
> direct system calls and don't try to track contention. Because of that,
> libstdc++ adding an extra layer of tracking is double book-keeping and
> violates "don't pay for what you don't need".
>
> [1] https://github.com/ogiroux/atomic_wait
>
THe final version that Olivier committed to libc++ does track
contention.
> 3) other int-sized atomics don't use futex
>
> The kernel doesn't care what we store in those 32 bits, only that they
> are
> comparable (FUTEX_OP does provide less-than and greater-than operations
> that
> are signed, but we don't use that and the kernel API can be later
> extended to
> unsigned compares). Therefore, I would recommend we extend futex
> support to
> all 32-bit signed scalar types. This would add:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
>
> 4) std::latch and std::counting_semaphore defaults too high
>
> The standard requires a ptrdiff_t in the API, but it does allow us to
> set hard
> limits and/or defaults for best performance. Right now, std::latch is
> implemented using ptrdiff_t internally and std::counting_semaphore's
> default
> limit uses PTRDIFF_MAX:
>
> template<ptrdiff_t __least_max_value =
> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
> class counting_semaphore
>
> As a consequence, those two classes will use 64-bit atomics on 64-bit
> architectures, which means they will not use futex. I recommend
> changing
> std::latch's internal to int and lowering the default maximum in
> std::counting_semaphore to INT_MAX.
>
> 5) std::barrier uses a type that futex can't handle
>
> The barrier phase is the actual type that gets waited / notified on,
> but it is
> an enum class on unsigned char:
>
> enum class __barrier_phase_t : unsigned char { };
>
> This is a harder problem to solve, but I really do recommend making it
> 32-bit
> in width. Whether each barrier phase holds only 256 values or the full
> 32
> bits, I don't know (I've implemented with wrapping on 256).
>
> What is of consequence is that the __state_t type could increase in
> size
> ("could" because we can simply reduce the number of tickets in each
> state
> entry). I haven't done enough research on this to know whether it is
> the right
> way to go.
My understanding from Olivier Giroux, who's barrier implementation code
this is derived from, is that he extensively tested it's performance on
a wide variety of hardware, including time on Summit. Short of hard data
to the contrary, I am not inclined to second guess the implementation he
tested (which uses char tickets).
Links:
------
[1] https://github.com/ogiroux/atomic_wait
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-02-26 15:59 ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
@ 2021-02-28 15:05 ` Hans-Peter Nilsson
2021-03-01 16:28 ` Thomas Rodgers
2021-03-01 17:24 ` Thiago Macieira
0 siblings, 2 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-02-28 15:05 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> ints can be used in futexes. chars can't.
Shouldn't that be an atomic type instead of a bare int then?
> ---
> libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
> index e09212dfcb9..ae058bd3dc3 100644
> --- a/libstdc++-v3/include/std/barrier
> +++ b/libstdc++-v3/include/std/barrier
> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for two main reasons:
>
> */
>
> - enum class __barrier_phase_t : unsigned char { };
> + enum class __barrier_phase_t : int { };
brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-27 0:13 ` Thiago Macieira
@ 2021-02-28 21:31 ` Hans-Peter Nilsson
2021-03-01 8:56 ` Richard Biener
2021-03-01 16:32 ` Thomas Rodgers
0 siblings, 2 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-02-28 21:31 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Andreas Schwab, libstdc++, gcc-patches
On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> > On Feb 26 2021, Thiago Macieira wrote:
> > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > >> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > >> > + alignas(__alignof__(int)) int _M_a;
> > >>
> > >> Futexes must be aligned to 4 bytes.
> > >
> > > Agreed, but doesn't this accomplish that?
> >
> > No. It uses whatever alignment the type already has, and is an
> > elaborate no-op.
>
> I thought so too when I read the original line. But I expected it was written
> like that for a reason, especially since the same pattern appears in other
> places.
>
> I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
> the correct solution?
IMNSHO make use of the corresponding atomic type. Then there'd
be no need for separate what's-the-right-align-curse games.
brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-28 21:31 ` Hans-Peter Nilsson
@ 2021-03-01 8:56 ` Richard Biener
2021-03-03 14:56 ` Jonathan Wakely
2021-03-01 16:32 ` Thomas Rodgers
1 sibling, 1 reply; 73+ messages in thread
From: Richard Biener @ 2021-03-01 8:56 UTC (permalink / raw)
To: Hans-Peter Nilsson
Cc: Thiago Macieira, GCC Patches, libstdc++, Andreas Schwab
On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>
>
>
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>
> > On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
> > > On Feb 26 2021, Thiago Macieira wrote:
> > > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
> > > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
> > > >> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
> > > >> > + alignas(__alignof__(int)) int _M_a;
> > > >>
> > > >> Futexes must be aligned to 4 bytes.
> > > >
> > > > Agreed, but doesn't this accomplish that?
> > >
> > > No. It uses whatever alignment the type already has, and is an
> > > elaborate no-op.
> >
> > I thought so too when I read the original line. But I expected it was written
> > like that for a reason, especially since the same pattern appears in other
> > places.
> >
> > I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
> > the correct solution?
>
> IMNSHO make use of the corresponding atomic type. Then there'd
> be no need for separate what's-the-right-align-curse games.
Or align as the corresponding atomic type (in case using an actual
std::atomic<int> is undesirable). OTOH the proposed code isn't
any more bogus than the previous ...
Richard.
> brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-02-28 15:05 ` Hans-Peter Nilsson
@ 2021-03-01 16:28 ` Thomas Rodgers
2021-03-01 17:24 ` Thiago Macieira
1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 16:28 UTC (permalink / raw)
To: Hans-Peter Nilsson; +Cc: libstdc++, gcc-patches
On 2021-02-28 07:05, Hans-Peter Nilsson wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>
>> ints can be used in futexes. chars can't.
>
> Shouldn't that be an atomic type instead of a bare int then?
>
It's an atomic_ref.
>> ---
>> libstdc++-v3/include/std/barrier | 21 ++++++++++++---------
>> 1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/std/barrier
>> b/libstdc++-v3/include/std/barrier
>> index e09212dfcb9..ae058bd3dc3 100644
>> --- a/libstdc++-v3/include/std/barrier
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -70,7 +70,7 @@ It looks different from literature pseudocode for
>> two main reasons:
>>
>> */
>>
>> - enum class __barrier_phase_t : unsigned char { };
>> + enum class __barrier_phase_t : int { };
>
> brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-28 21:31 ` Hans-Peter Nilsson
2021-03-01 8:56 ` Richard Biener
@ 2021-03-01 16:32 ` Thomas Rodgers
1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 16:32 UTC (permalink / raw)
To: Hans-Peter Nilsson
Cc: Thiago Macieira, gcc-patches, libstdc++, Andreas Schwab
On 2021-02-28 13:31, Hans-Peter Nilsson wrote:
>> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>>
>> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote: On Feb
>> 26 2021, Thiago Macieira wrote: On Friday, 26 February 2021 10:14:42
>> PST Andreas Schwab wrote: On Feb 26 2021, Thiago Macieira via
>> Gcc-patches wrote: - alignas(__alignof__(ptrdiff_t)) ptrdiff_t
>> _M_a;
>> + alignas(__alignof__(int)) int _M_a;
>> Futexes must be aligned to 4 bytes.
>>
>> Agreed, but doesn't this accomplish that?
> No. It uses whatever alignment the type already has, and is an
> elaborate no-op.
> I thought so too when I read the original line. But I expected it was
> written
> like that for a reason, especially since the same pattern appears in
> other
> places.
> I can change to "alignas(4)" (which is a GCC extension, I believe). Is
> that
> the correct solution?
> IMNSHO make use of the corresponding atomic type. Then there'd
> be no need for separate what's-the-right-align-curse games.
There is no predicate wait on atomic<T>.
> brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-02-28 15:05 ` Hans-Peter Nilsson
2021-03-01 16:28 ` Thomas Rodgers
@ 2021-03-01 17:24 ` Thiago Macieira
2021-03-01 17:38 ` Thomas Rodgers
2021-03-01 18:12 ` Ville Voutilainen
1 sibling, 2 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 17:24 UTC (permalink / raw)
To: Hans-Peter Nilsson; +Cc: libstdc++, gcc-patches
On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > ints can be used in futexes. chars can't.
>
> Shouldn't that be an atomic type instead of a bare int then?
There are a couple of atomic_refs:
using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
using __atomic_phase_const_ref_t = std::__atomic_ref<const
__barrier_phase_t>;
And _M_phase, despite being non-atomic, is never accessed without the
atomic_ref, aside from the constructor. Both arrive() and wait() start off by
creating the atomic_ref.
But I confess I don't understand this code sufficiently to say it is correct.
I'm simply saying that waiting on unsigned chars will not use a futex, at
least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 17:24 ` Thiago Macieira
@ 2021-03-01 17:38 ` Thomas Rodgers
2021-03-01 17:40 ` Thomas Rodgers
2021-03-01 18:06 ` Thiago Macieira
2021-03-01 18:12 ` Ville Voutilainen
1 sibling, 2 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:38 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches
On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:
> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On
> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be
> used in futexes. chars can't.
> Shouldn't that be an atomic type instead of a bare int then?
> There are a couple of atomic_refs:
>
> using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
> using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
>
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start
> off by
> creating the atomic_ref.
If it's non-atomic, then how is wait() supposed to wait on it,
atomically?
> But I confess I don't understand this code sufficiently to say it is
> correct.
> I'm simply saying that waiting on unsigned chars will not use a futex,
> at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the
> kernel.
And I am not disagreeing with that. I am, however saying, that I know
this particular implementation (well the upstream one it is based on)
has been extensively tested by the original author (ogiroux) including
time on Summit. If we are going to start changing his design decisions
(beyond the largely cosmetic, not algorithmic, ones that I have made as
per Jonathan's request), they should be motivated by more than a 'well
we feel int's would be better here because Futex' justification.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 17:38 ` Thomas Rodgers
@ 2021-03-01 17:40 ` Thomas Rodgers
2021-03-01 18:06 ` Thiago Macieira
1 sibling, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:40 UTC (permalink / raw)
To: libstdc++, gcc-patches; +Cc: Hans-Peter Nilsson
Botched replying to the list, sending again
On 2021-03-01 09:38, Thomas Rodgers wrote:
> On 2021-03-01 09:24, Thiago Macieira via Libstdc++ wrote:
>
>> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote: On
>> Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote: ints can be
>> used in futexes. chars can't.
>> Shouldn't that be an atomic type instead of a bare int then?
>
>> There are a couple of atomic_refs:
>>
>> using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
>> using __atomic_phase_const_ref_t = std::__atomic_ref<const
>> __barrier_phase_t>;
>>
>> And _M_phase, despite being non-atomic, is never accessed without the
>> atomic_ref, aside from the constructor. Both arrive() and wait() start
>> off by
>> creating the atomic_ref.
>
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?
>
>> But I confess I don't understand this code sufficiently to say it is
>> correct.
>> I'm simply saying that waiting on unsigned chars will not use a futex,
>> at
>> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the
>> kernel.
>
> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 3:01 ` Thomas Rodgers
@ 2021-03-01 17:46 ` Thomas Rodgers
2021-03-01 18:00 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 17:46 UTC (permalink / raw)
To: Thiago Macieira; +Cc: trodgers, jwakely, libstdc++
I discussed the prematureness of the contention optimization with
ogiroux this morning. Confirming, libc++ does do this, and in his
measurements it saves on the order of 50us.
Regarding the use of _Hash_impl::hash(), that can be changed to just use
the pointer value itself. This is what libatomic does. I'm happy to
propose that as a subsequent patch to the most recent version of this
code which I submitted recently.
On 2021-02-26 19:01, Thomas Rodgers wrote:
> On 2021-02-26 17:29, Thomas Rodgers wrote:
>
>> On 2021-02-26 17:13, Thomas Rodgers wrote:
>>
>> We are SPECIFICALLY NOT committing to an ABI for wait/notify with
>> GCC11. These features are experimental and I have no intention of
>> treating this as a blocker.
>>
>> On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote:
>>
>> Hello all
>>
>> I was investigating the implementation of std::atomic<T>::wait to use
>> inside
>> my own code (Qt's QMutex and QSemaphore), when I realised that it has
>> huge inefficiencies. Since everything in this implementation is inline
>> and,
>> once released, it will tie our hands until the next ABI break
>> (libstdc++.so.
>> 7). And no, inline namespaces and ABI tags are no different than ABI
>> breaks,
>> they only make the ABI break more or less silent in the process.
>>
>> Here's a summary of the findings:
>>
>> 1) everything is inline
>> 2) futex code is still behind a lot of code calling into
>> std::_Hash_bytes
>
> It's likely possible to remove the dependence on _Hash_bytes. I looked
> at it while working on the most recent version of this code but decided
> to not touch it for GCC11 because...it's experimental and we
> aren't committing to an ABI.
>
>> 3) other int-sized (incl. enums) atomics don't use futex
>
> The most recent version of this code extends the range of types which
> use futex to all 32bit types.
>
>> 4) std::latch and std::counting_semaphore defaults preclude from using
>> futex on Linux
>> 5) std::barrier implementation also uses a type that futex(2) can't
>> handle
>>
>> Here's what the code currently looks like:
>> https://gcc.godbolt.org/z/MeP8rr
>>
>> Given how far we are in the release cycle, a simple solution is to
>> simply
>> disable the entire solution by #if 0'ing out bits/atomic_wait.h. If
>> __cpp_lib_atomic_wait isn't defined, the rest of the functionality
>> cascades
>> disabled too. Then we can spend however long is necessary to get a
>> good
>> implementation for GCC 12.
>>
>> I have patches for all but the first issue, but they are a result of
>> only 3
>> hours of hacking to prove the concept. I will post them, but I do not
>> expect
>> them to get merged as-is.
>>
>> Details:
>>
>> 1) everything is inline
>>
>> I think this is unnecessary optimisation, especially since it expands
>> to a LOT
>> of code (see link above). A cursory research into libc++ and MS STL
>> shows that
>> the actual wait and notification are out-of-line. I recommend we do
>> the same
>> for libstdc++: if we're going to syscall anyway, the extra function
>> call isn't
>> going to be the performance bottleneck.
>>
>> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050360.html>,
>> Jonathan
>> wrote:
>>
>> Eventually we'll want to move the rest of this function (which doesn't
>> depend on the template argument) into the compiled library, but it's
>> better to be header-only for now.
>> But that has never happened.
>>
>> Having out-of-line implementations would allow the implementation for
>> generic-
>> sized atomic wait/notify to fine-tune as time and experience
>> progresses. The
>> std::__detail::__waiter implementation looks clever, but it's unclear
>> if it is
>> the best for all workloads. Moreover, the implementation is limited to
>> 16
>> waiters.
>>
>> Moreover, it would allow us to have support in the future for a 64-bit
>> futex.
>> Linus did shoot down any need for it 13 years ago, but the world has
>> changed
>> since and it's not inconceivable to have a pointer-sized futex on
>> every
>> platform, which on 64-bit would be 64-bit. Windows and Darwin do have
>> support
>> for 64-bit compare-and-wait functionality.
>>
>> And it would allow us to have futex-equivalent support in other OSes,
>> added at
>> our leisure. The current implementation does not support Windows's
>> WaitOnAddress and could never support it if released in the current
>> state.
>> It's also possible other OSes will eventually add equivalent
>> functionality
>> because of the C++ standard. For example, FreeBSD already has futex()
>> support
>> inside its kernel in order to run Linux binaries, but it is not
>> exposed to the
>> FreeBSD-personality syscall ABI.
>>
>> Inlining certain functionality at a later date is possible.
>> De-inlining isn't.
>>
>> 2) futex code for int is still behind a lot of boiler plate
>>
>> It comes from the creation/use of std::__detail::__waiter in
>> std::__atomic_wait and __atomic_notify. That does allow one to detect
>> whether
>> there is contention in the first place and avoid the system call, but
>> I think
>> that's premature optimisation.
>>
>> In <https://gcc.gnu.org/pipermail/libstdc++/2020-May/050362.html>,
>> Thomas
>> says:
>>
>> __atomic_notify checks to see if that count is non-zero
>> before invoking the platform/semaphore notify because it is cheaper
>> to do the atomic load than it is to make the syscall() when there are
>> no
>> waiters.
>> That's true, but it ignores the fact that users of
>> std::atomic<T>::wait() are
>> likely to track contention by themselves on the value of the atomic (I
>> know I
>> do). libc++, MS STL and the original atomic_wait[1 [1]] implementation
>> operate on
>> direct system calls and don't try to track contention. Because of
>> that,
>> libstdc++ adding an extra layer of tracking is double book-keeping and
>> violates "don't pay for what you don't need".
>>
>> [1] https://github.com/ogiroux/atomic_wait
>
> THe final version that Olivier committed to libc++ does track
> contention.
>
>> 3) other int-sized atomics don't use futex
>>
>> The kernel doesn't care what we store in those 32 bits, only that they
>> are
>> comparable (FUTEX_OP does provide less-than and greater-than
>> operations that
>> are signed, but we don't use that and the kernel API can be later
>> extended to
>> unsigned compares). Therefore, I would recommend we extend futex
>> support to
>> all 32-bit signed scalar types. This would add:
>> * pointers and long on 32-bit architectures
>> * unsigned
>> * untyped enums and typed enums on int & unsigned int
>> * float
>>
>> 4) std::latch and std::counting_semaphore defaults too high
>>
>> The standard requires a ptrdiff_t in the API, but it does allow us to
>> set hard
>> limits and/or defaults for best performance. Right now, std::latch is
>> implemented using ptrdiff_t internally and std::counting_semaphore's
>> default
>> limit uses PTRDIFF_MAX:
>>
>> template<ptrdiff_t __least_max_value =
>> __gnu_cxx::__int_traits<ptrdiff_t>::__max>
>> class counting_semaphore
>>
>> As a consequence, those two classes will use 64-bit atomics on 64-bit
>> architectures, which means they will not use futex. I recommend
>> changing
>> std::latch's internal to int and lowering the default maximum in
>> std::counting_semaphore to INT_MAX.
>>
>> 5) std::barrier uses a type that futex can't handle
>>
>> The barrier phase is the actual type that gets waited / notified on,
>> but it is
>> an enum class on unsigned char:
>>
>> enum class __barrier_phase_t : unsigned char { };
>>
>> This is a harder problem to solve, but I really do recommend making it
>> 32-bit
>> in width. Whether each barrier phase holds only 256 values or the full
>> 32
>> bits, I don't know (I've implemented with wrapping on 256).
>>
>> What is of consequence is that the __state_t type could increase in
>> size
>> ("could" because we can simply reduce the number of tickets in each
>> state
>> entry). I haven't done enough research on this to know whether it is
>> the right
>> way to go.
>
> My understanding from Olivier Giroux, who's barrier implementation code
> this is derived from, is that he extensively tested it's performance on
> a wide variety of hardware, including time on Summit. Short of hard
> data to the contrary, I am not inclined to second guess the
> implementation he tested (which uses char tickets).
>
> Links:
> ------
> [1] https://github.com/ogiroux/atomic_wait
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-01 17:46 ` Thomas Rodgers
@ 2021-03-01 18:00 ` Thiago Macieira
2021-03-01 18:34 ` Thomas Rodgers
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 18:00 UTC (permalink / raw)
To: Thomas Rodgers; +Cc: trodgers, jwakely, libstdc++
On Monday, 1 March 2021 09:46:08 PST Thomas Rodgers wrote:
> I discussed the prematureness of the contention optimization with ogiroux
> this morning. Confirming, libc++ does do this, and in his measurements it
> saves on the order of 50us.
>
> Regarding the use of _Hash_impl::hash(), that can be changed to just use the
> pointer value itself. This is what libatomic does. I'm happy to propose
> that as a subsequent patch to the most recent version of this code which I
> submitted recently.
Is the benchmark source somewhere I can take a look at?
I'm worried that it is benchmarking the wrong thing. Can we benchmark latch
and counting_semaphore instead? Those already track contention on the waitable
atomic by themselves.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 17:38 ` Thomas Rodgers
2021-03-01 17:40 ` Thomas Rodgers
@ 2021-03-01 18:06 ` Thiago Macieira
2021-03-01 19:08 ` Thomas Rodgers
1 sibling, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 18:06 UTC (permalink / raw)
To: Thomas Rodgers; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches
On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote:
> > And _M_phase, despite being non-atomic, is never accessed without the
> > atomic_ref, aside from the constructor. Both arrive() and wait() start
> > off by
> > creating the atomic_ref.
>
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?
Hey, it's your code :-)
It is using atomics to operate on the value. It's just that the type on the
structure isn't an atomic by itself.
Why, I don't know. Olivier's original code did use atomics
<https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
std::atomic<ptrdiff_t> expected_adjustment;
std::atomic<__phase_t> phase;
> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.
That's a reasonable request.
I'd like to see the benchmark results of using a directly-futexable type vs
using unsigned char. But even the timing results need to be weighed against
the increased memory use for std::barrier, which means it's not a directly-
objective conclusion. And given we *may* get 8-bit futexes, it might be worth
keeping them this way and just tell people with large contentions to upgrade.
That of course rests on the contended atomic_wait being out-of-line.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 17:24 ` Thiago Macieira
2021-03-01 17:38 ` Thomas Rodgers
@ 2021-03-01 18:12 ` Ville Voutilainen
2021-03-01 19:44 ` Thiago Macieira
1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 18:12 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches List
On Mon, 1 Mar 2021 at 20:09, Thiago Macieira via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> On Sunday, 28 February 2021 07:05:47 PST Hans-Peter Nilsson wrote:
> > On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
> > > ints can be used in futexes. chars can't.
> >
> > Shouldn't that be an atomic type instead of a bare int then?
>
> There are a couple of atomic_refs:
>
> using __atomic_phase_ref_t = std::__atomic_ref<__barrier_phase_t>;
> using __atomic_phase_const_ref_t = std::__atomic_ref<const
> __barrier_phase_t>;
>
> And _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start off by
> creating the atomic_ref.
>
> But I confess I don't understand this code sufficiently to say it is correct.
> I'm simply saying that waiting on unsigned chars will not use a futex, at
> least until https://lkml.org/lkml/2019/12/4/1373 is merged into the kernel.
I do have a question about the intent/concern here, regardless of what
your patch technically
does. The ABI break _is_ your concern, and the "heisenbugs" you were
worried about would
in fact be caused by the ABI break? So if you embed these things in
your QAtomicThing, the ABI
break may mess it up(*)? Is that a correct understanding of the concern here?
(*) And inlining might not help because users might compile a
different std::atomic_thing in their
program than what your DSO has been compiled with, and you may end up
using mixtures.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-01 18:00 ` Thiago Macieira
@ 2021-03-01 18:34 ` Thomas Rodgers
2021-03-01 19:11 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 18:34 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Thomas Rodgers, jwakely, libstdc++
----- Original Message -----
> On Monday, 1 March 2021 09:46:08 PST Thomas Rodgers wrote:
> > I discussed the prematureness of the contention optimization with ogiroux
> > this morning. Confirming, libc++ does do this, and in his measurements it
> > saves on the order of 50us.
> >
> > Regarding the use of _Hash_impl::hash(), that can be changed to just use
> > the
> > pointer value itself. This is what libatomic does. I'm happy to propose
> > that as a subsequent patch to the most recent version of this code which I
> > submitted recently.
>
> Is the benchmark source somewhere I can take a look at?
>
I'd have to ask Olivier.
> I'm worried that it is benchmarking the wrong thing. Can we benchmark latch
> and counting_semaphore instead? Those already track contention on the
> waitable
> atomic by themselves.
>
So is your concern here that e.g. latch::count_down() will do an extra atomic load?
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
> Software Architect - Intel DPG Cloud Engineering
>
>
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 18:06 ` Thiago Macieira
@ 2021-03-01 19:08 ` Thomas Rodgers
0 siblings, 0 replies; 73+ messages in thread
From: Thomas Rodgers @ 2021-03-01 19:08 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Hans-Peter Nilsson, libstdc++, gcc-patches
On 2021-03-01 10:06, Thiago Macieira wrote:
> On Monday, 1 March 2021 09:38:58 PST Thomas Rodgers wrote: And
> _M_phase, despite being non-atomic, is never accessed without the
> atomic_ref, aside from the constructor. Both arrive() and wait() start
> off by
> creating the atomic_ref.
> If it's non-atomic, then how is wait() supposed to wait on it,
> atomically?
> Hey, it's your code :-)
>
> It is using atomics to operate on the value. It's just that the type on
> the
> structure isn't an atomic by itself.
> Why, I don't know. Olivier's original code did use atomics
> <https://github.com/ogiroux/atomic_wait/blob/master/include/barrier#L55-L56>:
> std::atomic<ptrdiff_t> expected_adjustment;
> std::atomic<__phase_t> phase;
It is atomic, in that it is always an atomic_ref. This change was made
at
Jonathan's request.
> And I am not disagreeing with that. I am, however saying, that I know
> this particular implementation (well the upstream one it is based on)
> has been extensively tested by the original author (ogiroux) including
> time on Summit. If we are going to start changing his design decisions
> (beyond the largely cosmetic, not algorithmic, ones that I have made as
> per Jonathan's request), they should be motivated by more than a 'well
> we feel int's would be better here because Futex' justification.
> That's a reasonable request.
>
> I'd like to see the benchmark results of using a directly-futexable
> type vs
> using unsigned char. But even the timing results need to be weighed
> against
> the increased memory use for std::barrier, which means it's not a
> directly-
> objective conclusion. And given we *may* get 8-bit futexes, it might be
> worth
> keeping them this way and just tell people with large contentions to
> upgrade.
We may also want to introduce a central barrier type for memory
constrained environments. I specifically
removed that from this patch because it -
1) wasn't clear how we'd go about making that decision
2) this support in GCC11 is experimental
> That of course rests on the contended atomic_wait being out-of-line.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-01 18:34 ` Thomas Rodgers
@ 2021-03-01 19:11 ` Thiago Macieira
0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 19:11 UTC (permalink / raw)
To: Thomas Rodgers; +Cc: Thomas Rodgers, jwakely, libstdc++
On Monday, 1 March 2021 10:34:22 PST Thomas Rodgers wrote:
> > I'm worried that it is benchmarking the wrong thing. Can we benchmark
> > latch
> > and counting_semaphore instead? Those already track contention on the
> > waitable
> > atomic by themselves.
>
> So is your concern here that e.g. latch::count_down() will do an extra
> atomic load?
I'm concerned we're violating "Don't Pay for What You Don't Need". Waiting
algorithms should be written to track the contention by themselves. If they
say they want to wake or wait, the runtime shouldn't second-guess them.
Take a look at how glibc's sem_post and sem_wait are implemented. On systems
with 64-bit atomics, they store the number of waiters in the high 32-bit; on
systems without, they store a single bit indicating whether there's anyone
waiting. The 64-bit code summarises to:
void acquire()
{
auto &low = low_half_by_ref(); // handle big-endian, reinterpret_cast,
uint64_t d = _M_counter.fetch_add(1ULL << 32);
for (;;) {
if ((d & SEM_VALUE_MASK) == 0) {
low.wait(d);
d = _M_counter.load(std::memory_order_relaxed);
} else {
if (_M_counter.compare_exchange_strong(d, d - 1, ...))
break;
}
_M_counter.fetch_sub(1ULL << 32);
}
void release(int n = 1)
{
auto &low = low_half_by_ref(); // handle big-endian, reinterpret_cast,
uint64_d = _M_counter.fetch_add(n);
if (d >> SEM_NWAITERS_SHIFT)
if (n == 1)
low.notify_one();
else
low.notify_all(); // ought to be "notify_many(n)"
}
It's even simpler in the latch, where any value different from "the last to
arrive" is "contended".
_GLIBCXX_ALWAYS_INLINE void
count_down(ptrdiff_t __update = 1)
{
auto const __old = __atomic_impl::fetch_sub(&_M_a,
__update, memory_order::release);
if (__old == __update)
__atomic_impl::notify_all(&_M_a);
}
There's no need to track contention again here. The only case where it would
be useful is if there's only one thread operating on this latch, but if that's
the case then why are you using a latch in the first place? And again, if
that's an issue, then we can use the LSB or MSB to indicate that there are
waiters waiting (we don't need a count of waiters).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 18:12 ` Ville Voutilainen
@ 2021-03-01 19:44 ` Thiago Macieira
2021-03-01 20:35 ` Ville Voutilainen
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 19:44 UTC (permalink / raw)
To: Ville Voutilainen, libstdc++; +Cc: gcc-patches List
On Monday, 1 March 2021 10:12:35 PST Ville Voutilainen wrote:
> I do have a question about the intent/concern here, regardless of what
> your patch technically
> does. The ABI break _is_ your concern, and the "heisenbugs" you were
> worried about would
> in fact be caused by the ABI break? So if you embed these things in
> your QAtomicThing, the ABI
> break may mess it up(*)? Is that a correct understanding of the concern
> here?
Let me clarify. I am stating that the current implementation is inefficient
because Linux currently only implements 32-bit futexes. With that in mind, I
am arguing we need to change the implementation in a way that will break ABI
in a new release.
The concern is that such a change, despite being in experimental code for
which ABI stability has never been promised, is still troublesome. The problem
there is that even people who have read the documentation and waited until
2024 to write code using the feature may still be affected. This isn't about
Qt because I have no plans to use wait and notify in inline API.
But you can see someone doing:
#if __cplusplus >= 202002L && __has_include(<latch>)
# include <latch>
#else
# error "Please upgrade your compiler & standard library"
#endif
and using <latch> in their inline code. And as you say, if they then mix DSOs
compiled with different versions of GCC, one of them might be the old,
experimental and binary-incompatible code. Remember that before April 2024,
the most recent Ubuntu LTS will be 22.04, which will be released before GCC
12, which means it will contain GCC 11.
So, wholly aside from how we fix the inefficiencies, we must decide how to
deal with the ABI break. We can:
a) not have an ABI break in the first place, by having the above code not
compile with GCC until we promise ABI compatibility
b) cause a non-silent ABI break
c) accept the silent ABI break
I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be
something like (untested!):
_GLIBCXX_BEGIN_NAMESPACE_VERSION
namespace __detail
{
+ struct __waiters;
+ inline namespace __experimental
+ {
+ // from atomic_wait.cc
+ __waiters &__get_waiter(const void *);
+ }
using __platform_wait_t = int;
constexpr auto __atomic_spin_count_1 = 16;
@@ -187,11 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static __waiters&
_S_for(const void* __t)
{
- const unsigned char __mask = 0xf;
- static __waiters __w[__mask + 1];
-
- auto __key = _Hash_impl::hash(__t) & __mask;
- return __w[__key];
+ return __get_waiter(__t);
}
};
That way, any DSO compiled using GCC 11 will fail to load when using GCC 12's
libstdc++. And probably put "std::__detail::__experimental" in the
GLIBCXX_EXPERIMENTAL ELF version so it's even clearer that it's experimental,
thus helping Linux distributions and other binary artefact distributors
realise they've made a mistake.
I still don't like (b) because it pushes the problem to the wrong people: the
packagers. But it's far better than (c).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 19:44 ` Thiago Macieira
@ 2021-03-01 20:35 ` Ville Voutilainen
2021-03-01 21:54 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 20:35 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches List
On Mon, 1 Mar 2021 at 21:44, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But you can see someone doing:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> # include <latch>
> #else
> # error "Please upgrade your compiler & standard library"
> #endif
>
> and using <latch> in their inline code. And as you say, if they then mix DSOs
> compiled with different versions of GCC, one of them might be the old,
> experimental and binary-incompatible code. Remember that before April 2024,
> the most recent Ubuntu LTS will be 22.04, which will be released before GCC
> 12, which means it will contain GCC 11.
>
> So, wholly aside from how we fix the inefficiencies, we must decide how to
> deal with the ABI break. We can:
>
> a) not have an ABI break in the first place, by having the above code not
> compile with GCC until we promise ABI compatibility
> b) cause a non-silent ABI break
> c) accept the silent ABI break
>
> I advocate for (a) and vehemently disagree with (c). Meanwhile, (b) can be
> something like (untested!):
I can't make the above code work, in any reasonable manner, because
it's doing feature detection incorrectly. :)
What I can imagine doing, however, is this:
1) a published IS always bumps feature-macro values (this won't help
now, it's a future fix, we don't currently do this in WG21)
2) an implementation uses the pre-IS draft values before it deems itself stable
3) users who want stability should require the feature-testing macro
to have at least the IS value
4) adventurous users can either use the macro without checking its
value, or use at least the value that gives them
whatever shiny toy they're interested in
With that, we can keep allowing adventurous users to opt in early, and
you have a portable way to detect that
features are stable, for an implementation-defined definition of "stable".
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 20:35 ` Ville Voutilainen
@ 2021-03-01 21:54 ` Thiago Macieira
2021-03-01 22:04 ` Ville Voutilainen
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 21:54 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List
On Monday, 1 March 2021 12:35:05 PST Ville Voutilainen wrote:
> I can't make the above code work, in any reasonable manner, because
> it's doing feature detection incorrectly. :)
> What I can imagine doing, however, is this:
>
> 1) a published IS always bumps feature-macro values (this won't help
> now, it's a future fix, we don't currently do this in WG21)
This is mostly already the case. I haven't seen any need to bump the values.
> 2) an implementation uses the pre-IS draft values before it deems itself
> stable
Before the IS is stable? From what I've seen so far, the macros are always
around the month of the latest draft of a given feature. So if an
implementation implemented an earlier draft, the macro version would increase
in a new draft coming before the IS.
I think we just need to be careful of updates done after the latest draft is
adopted, usually by NBs. Those have to bump the macro too. I think you're in
good speaking terms with the Finland NB :-)
> 3) users who want stability should require the feature-testing macro
> to have at least the IS value
True. That's not the stability I was talking about, but it could be done.
> 4) adventurous users can either use the macro without checking its
> value, or use at least the value that gives them
> whatever shiny toy they're interested in
Fair enough.
Please note I am not talking about the stability of the feature as described
in the standard or a proposal or draft. I am talking about the stability of
the implementation. C++20 is out and the atomic-wait feature is stable, as
defined by the standard. What isn't stable is GCC's implementation of it.
But your suggestion does work. We don't need to apply them to all macros, only
those that are new in a given version, like __cpp_lib_atomic_wait or
__cpp_lib_latch in this case. Alternatively, implementations can set the macro
to a given value below the standard-required one (incl. 1) to indicate that
they haven't achieved stability.
That would make my check:
#if __cplusplus >= 202002L && __has_include(<latch>)
# include <latch>
#endif
#if __cpp_lib_latch < 201907L
# error "Please upgrade your Standard Library"
#endif
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 21:54 ` Thiago Macieira
@ 2021-03-01 22:04 ` Ville Voutilainen
2021-03-01 22:21 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 22:04 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches List
On Mon, 1 Mar 2021 at 23:54, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But your suggestion does work. We don't need to apply them to all macros, only
> those that are new in a given version, like __cpp_lib_atomic_wait or
> __cpp_lib_latch in this case. Alternatively, implementations can set the macro
> to a given value below the standard-required one (incl. 1) to indicate that
> they haven't achieved stability.
>
> That would make my check:
>
> #if __cplusplus >= 202002L && __has_include(<latch>)
> # include <latch>
> #endif
> #if __cpp_lib_latch < 201907L
> # error "Please upgrade your Standard Library"
> #endif
Well, this would be different. What I'm suggesting is not quite that;
for any *new* facility, we'd make sure
that its draft macro and the final IS macro are different, but the
minimum value is the first draft version,
not anything below it. I don't care too much, that approach and yours
would work the same way. Things that already
had an IS value for a macro and haven't changed since don't need to be
changed. And we don't
need to bump all values of existing facilities either, just for those
that got changes, so some existing macros
would be considered lost causes. Like the ones we're talking about,
because the cats are already out of the
bag.
I'll write a paper. That won't help you now, but it gives us tools in
the future.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 22:04 ` Ville Voutilainen
@ 2021-03-01 22:21 ` Thiago Macieira
2021-03-01 22:31 ` Ville Voutilainen
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 22:21 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List
On Monday, 1 March 2021 14:04:34 PST Ville Voutilainen wrote:
> Well, this would be different. What I'm suggesting is not quite that;
> for any *new* facility, we'd make sure
> that its draft macro and the final IS macro are different, but the
> minimum value is the first draft version,
> not anything below it. I don't care too much, that approach and yours
> would work the same way. Things that already
> had an IS value for a macro and haven't changed since don't need to be
> changed. And we don't
> need to bump all values of existing facilities either, just for those
> that got changes, so some existing macros
> would be considered lost causes. Like the ones we're talking about,
> because the cats are already out of the
> bag.
But the code I posted, if people are careful to use write like I did, would
allow us to have the experimental "we're not sure this is right"
implementation of atomic waits, latches, barriers and semaphores right now.
It would simply require that we decrement the macros by 1 in the libstdc++
headers.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 22:21 ` Thiago Macieira
@ 2021-03-01 22:31 ` Ville Voutilainen
2021-03-01 22:40 ` Thiago Macieira
0 siblings, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-01 22:31 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches List
On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com> wrote:
> But the code I posted, if people are careful to use write like I did, would
> allow us to have the experimental "we're not sure this is right"
> implementation of atomic waits, latches, barriers and semaphores right now.
The code assumes that as soon as __cplusplus bumps and a header is
present, things
are stable. I don't think that's a safe assumption to make.
Furthermore, the second #if
tries to check a feature-testing macro without including either the
corresponding header
or <version>. That doesn't work. You need to either include <version>
and check a macro,
or check that <latch> exists, then include <latch> and then check the macro.
But other than that, sure, as QoI, vendors could already provide the
standard macros with
numbers that are lower than the standard ever specified. Going
forward, if existing facilities
are changed, this stops working because now you'd have to track the
WPs to see which
values are "vendor-bogus". I find it better to just change the macros
whose facilities changed
during a standard cycle, and in those cases bump the IS macro, that'll
then work going forward.
In the meanwhile, when vendors can, they could use the technique you
describe, but that's
barely worth proposing as an SG10 guideline because they would've
needed to do it already, but didn't. :P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 4/5] barrier: use int instead of unsigned char for the phase state
2021-03-01 22:31 ` Ville Voutilainen
@ 2021-03-01 22:40 ` Thiago Macieira
0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-01 22:40 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: libstdc++, gcc-patches List
On Monday, 1 March 2021 14:31:09 PST Ville Voutilainen wrote:
> On Tue, 2 Mar 2021 at 00:21, Thiago Macieira <thiago.macieira@intel.com>
wrote:
> > But the code I posted, if people are careful to use write like I did,
> > would
> > allow us to have the experimental "we're not sure this is right"
> > implementation of atomic waits, latches, barriers and semaphores right
> > now.
>
> The code assumes that as soon as __cplusplus bumps and a header is
> present, things
> are stable.
Not exactly. Re-posting the code for reference:
#if __cplusplus >= 202002L && __has_include(<latch>)
# include <latch>
#endif
#if __cpp_lib_latch < 201907L
# error "Please upgrade your Standard Library"
#endif
The first section simply assumes that <latch> can be #included. The
__cplusplus check is necessary because MSVC STL's headers are present but
can't be #include'd otherwise (they cause #error).
It's the second check that is authoritative.
> I don't think that's a safe assumption to make.
> Furthermore, the second #if
> tries to check a feature-testing macro without including either the
> corresponding header
> or <version>. That doesn't work. You need to either include <version>
> and check a macro,
> or check that <latch> exists, then include <latch> and then check the macro.
<version> would work, but why can't you check the macro anyway? It may trigger
a warning with GCC's -Wundef, but it's just a warning. The preprocessor is
defined to replace any undefined identifier with 0. If you want to avoid the
warning, you could write:
#if defined(__cpp_lib_latch) && __cpp_lib_latch < 201907L
Is there anything I'm not seeing?
> But other than that, sure, as QoI, vendors could already provide the
> standard macros with
> numbers that are lower than the standard ever specified. Going
> forward, if existing facilities
> are changed, this stops working because now you'd have to track the
> WPs to see which
> values are "vendor-bogus". I find it better to just change the macros
> whose facilities changed
> during a standard cycle, and in those cases bump the IS macro, that'll
> then work going forward.
> In the meanwhile, when vendors can, they could use the technique you
> describe, but that's
> barely worth proposing as an SG10 guideline because they would've
> needed to do it already, but didn't. :P
I am not opposed to that. Your solution is better.
But this solution is less work on the standard and still works, even if it
creates a little more work on the users of said features. It's not
unsurmountable, since we often need to check which C++ standard edition it
came with anyway. So it doesn't matter what value it assumes: we'll be
consulting a table anyway.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-26 23:53 ` Thiago Macieira
2021-02-26 23:58 ` Ville Voutilainen
@ 2021-03-03 14:24 ` Jonathan Wakely
2021-03-03 17:12 ` Thiago Macieira
1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:24 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++
On 26/02/21 15:53 -0800, Thiago Macieira via Libstdc++ wrote:
>On Friday, 26 February 2021 10:47:09 PST Ville Voutilainen wrote:
>> On Fri, 26 Feb 2021 at 20:43, Thiago Macieira via Libstdc++
>>
>> <libstdc++@gcc.gnu.org> wrote:
>> > Oh? That's not a really widely understood fact. I see people using C++20
>> > right now all the time in the cpplang slack and no one seems to be
>> > worried that support is experimental.
>>
>> People writing toys on forums is not real production use that we
>> should worry about, and anyone who uses
>> a two-months-old standard in production use is going to survive
>> possible breaking changes. Or if they won't,
>> that's their stupid fault.
>
>Think about this scenario. Let's say GCC 12 declares to C++20 support to be
>non-experimental.
>
>Then someone writes a production tool that does:
>
>#if __cpp_lib_atomic_wait >= 201907L
> a.wait(n);
>#else
> fallback(a);
>#endif
I agree this is a problem. One created by the feature test macros
(because bfore they were added, you just had to figure out which
version was OK to use for yourself and test __GNUC__ or similar).
But it's not a new problem for GCC 11, and not specific to the atomic
wait feature. It's worth further consideration though.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-02-27 0:44 ` Ville Voutilainen
2021-02-27 0:53 ` Thiago Macieira
@ 2021-03-03 14:30 ` Jonathan Wakely
2021-03-03 17:07 ` Thiago Macieira
1 sibling, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:30 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Thiago Macieira, Thomas Rodgers, libstdc++
On 27/02/21 02:44 +0200, Ville Voutilainen via Libstdc++ wrote:
>On Sat, 27 Feb 2021 at 02:36, Thiago Macieira <thiago.macieira@intel.com> wrote:
>> Those goals seem incompatible. If the feature is experimental and not to be
>> used in production, then we don't want users to adopt the facilities early. We
>> want them to wait until we declare that it's no longer experimental.
>
>Well, we want *bold* users to adopt new facilities early. Not all users.
Yes, exactly. They could still do that if some additional option/macro
was needed to get to those new facilities though.
My concern is that if we add a new "I know what I'm doing" option,
everybody defines it (because that's what blog posts and youtube
videos and conference talks tell them to do, because that gives them
the new shiny things) and then it just becomes a default.
Maybe Qt wouldn't define it, but plenty would.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
` (4 preceding siblings ...)
2021-02-26 18:14 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
@ 2021-03-03 14:34 ` Jonathan Wakely
2021-03-03 17:14 ` Thiago Macieira
5 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:34 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
Yes, we should do this for GCC 11.
> libstdc++-v3/include/std/latch | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
>index ef8c301e5e9..156aea5c5e5 100644
>--- a/libstdc++-v3/include/std/latch
>+++ b/libstdc++-v3/include/std/latch
>@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> public:
> static constexpr ptrdiff_t
> max() noexcept
>- { return __gnu_cxx::__int_traits<ptrdiff_t>::__max; }
>+ { return __gnu_cxx::__int_traits<int>::__max; }
>
> constexpr explicit latch(ptrdiff_t __expected) noexcept
> : _M_a(__expected) { }
>@@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
> private:
>- alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>+ alignas(__alignof__(int)) int _M_a;
> };
> _GLIBCXX_END_NAMESPACE_VERSION
> } // namespace
>--
>2.30.1
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-02-26 15:59 ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
@ 2021-03-03 14:34 ` Jonathan Wakely
2021-03-03 16:21 ` Jonathan Wakely
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:34 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>The kernel doesn't care what we store in those 32 bits, only that they are
>comparable. This commit adds:
> * pointers and long on 32-bit architectures
> * unsigned
> * untyped enums and typed enums on int & unsigned int
> * float
>
>We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
>this field but has only used 6 values so far, so it can be extended to
>unsigned compares in the future, if needed.
And this one for GCC 11 too.
> libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
>index 424fccbe4c5..1c6bda2e2b6 100644
>--- a/libstdc++-v3/include/bits/atomic_wait.h
>+++ b/libstdc++-v3/include/bits/atomic_wait.h
>@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> template<typename _Tp>
> inline constexpr bool __platform_wait_uses_type
> #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>- = is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
>+ = is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)
>+ && alignof(_Tp) >= alignof(__platform_wait_t);
> #else
> = false;
> #endif
>@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> template<typename _Tp>
> void
>- __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
>+ __platform_wait(const _Tp* __addr, _Tp __val) noexcept
> {
> for(;;)
> {
> auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
> static_cast<int>(__futex_wait_flags::__wait_private),
>- __val, nullptr);
>+ static_cast<__platform_wait_t>(__val), nullptr);
> if (!__e || errno == EAGAIN)
> break;
> else if (errno != EINTR)
>--
>2.30.1
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 5/5] barrier: optimise by not having the hasher in a loop
2021-02-26 15:59 ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
@ 2021-03-03 14:36 ` Jonathan Wakely
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:36 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>Our thread's ID does not change so we don't have to get it every time
>and hash it every time.
This looks good for GCC 11.
(This one wouldn't be an ABI break to change later, but we might as
well do it now, as we have the patch now).
> libstdc++-v3/include/std/barrier | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
>index ae058bd3dc3..eb31a89b175 100644
>--- a/libstdc++-v3/include/std/barrier
>+++ b/libstdc++-v3/include/std/barrier
>@@ -101,12 +101,10 @@ It looks different from literature pseudocode for two main reasons:
> }
>
> bool
>- _M_arrive(__barrier_phase_t __old_phase)
>+ _M_arrive(__barrier_phase_t __old_phase, size_t __current)
> {
> size_t __current_expected = _M_expected;
>- std::hash<std::thread::id> __hasher;
>- size_t __current = __hasher(std::this_thread::get_id())
>- % ((_M_expected + 1) >> 1);
>+ __current %= ((__current_expected + 1) >> 1);
>
> const auto __half_step = _S_add_to_phase(__old_phase, 1);
> const auto __full_step = _S_add_to_phase(__old_phase, 2);
>@@ -167,11 +165,13 @@ It looks different from literature pseudocode for two main reasons:
> [[nodiscard]] arrival_token
> arrive(ptrdiff_t __update)
> {
>+ std::hash<std::thread::id> __hasher;
>+ size_t __current = __hasher(std::this_thread::get_id());
> __atomic_phase_ref_t __phase(_M_phase);
> const auto __old_phase = __phase.load(memory_order_relaxed);
> for(; __update; --__update)
> {
>- if(_M_arrive(__old_phase))
>+ if(_M_arrive(__old_phase, __current))
> {
> _M_completion();
> _M_expected += _M_expected_adjustment.load(memory_order_relaxed);
>--
>2.30.1
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-01 8:56 ` Richard Biener
@ 2021-03-03 14:56 ` Jonathan Wakely
2021-03-03 15:02 ` Andreas Schwab
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 14:56 UTC (permalink / raw)
To: Richard Biener
Cc: Hans-Peter Nilsson, Thiago Macieira, Andreas Schwab, libstdc++,
GCC Patches
On 01/03/21 09:56 +0100, Richard Biener via Libstdc++ wrote:
>On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>>
>>
>>
>> On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>>
>> > On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
>> > > On Feb 26 2021, Thiago Macieira wrote:
>> > > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>> > > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>> > > >> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>> > > >> > + alignas(__alignof__(int)) int _M_a;
>> > > >>
>> > > >> Futexes must be aligned to 4 bytes.
>> > > >
>> > > > Agreed, but doesn't this accomplish that?
>> > >
>> > > No. It uses whatever alignment the type already has, and is an
>> > > elaborate no-op.
>> >
>> > I thought so too when I read the original line. But I expected it was written
>> > like that for a reason, especially since the same pattern appears in other
>> > places.
>> >
>> > I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
>> > the correct solution?
>>
>> IMNSHO make use of the corresponding atomic type. Then there'd
>> be no need for separate what's-the-right-align-curse games.
That won't work though, because we need direct access to the integer
object, not to a std::atomic<int> which contains it.
>Or align as the corresponding atomic type (in case using an actual
>std::atomic<int> is undesirable). OTOH the proposed code isn't
>any more bogus than the previous ...
The previous code accounts for the fact that ptrdiff_t is a typedef
for an unspecified type, and that some ABIs allow struct members to have
weaker alignment than they would have otherwise.
e.g. __alignof__(long long) != alignof(long long) on x86.
Yes, I know ptrdiff_t isn't long long on x86, but since it's a typedef
for some target-specific type, it's still possible that
alignof != __alignof__. So alignas(__alignof__(T)) is not necessarily
a no-op. So not bogus.
For int, there shouldn't be any need to force the alignment. I don't
think any ABI supported by GCC allows int members to be aligned to
less than __alignof__(int). Users could break it by compiling with
#pragma pack, but I have no sympathy for such silliness.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-03 14:56 ` Jonathan Wakely
@ 2021-03-03 15:02 ` Andreas Schwab
2021-03-03 15:10 ` Jonathan Wakely
2021-03-03 15:37 ` Hans-Peter Nilsson
2 siblings, 0 replies; 73+ messages in thread
From: Andreas Schwab @ 2021-03-03 15:02 UTC (permalink / raw)
To: Jonathan Wakely
Cc: Richard Biener, Hans-Peter Nilsson, Thiago Macieira, libstdc++,
GCC Patches
On Mär 03 2021, Jonathan Wakely wrote:
> For int, there shouldn't be any need to force the alignment. I don't
> think any ABI supported by GCC allows int members to be aligned to
> less than __alignof__(int).
There is no requirement that __alignof__(int) is big enough.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-03 14:56 ` Jonathan Wakely
2021-03-03 15:02 ` Andreas Schwab
@ 2021-03-03 15:10 ` Jonathan Wakely
2021-03-03 15:37 ` Hans-Peter Nilsson
2 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 15:10 UTC (permalink / raw)
To: Richard Biener
Cc: Hans-Peter Nilsson, Thiago Macieira, Andreas Schwab, libstdc++,
GCC Patches
On 03/03/21 14:56 +0000, Jonathan Wakely wrote:
>On 01/03/21 09:56 +0100, Richard Biener via Libstdc++ wrote:
>>On Sun, Feb 28, 2021 at 10:53 PM Hans-Peter Nilsson <hp@bitrange.com> wrote:
>>>
>>>
>>>
>>>On Fri, 26 Feb 2021, Thiago Macieira via Gcc-patches wrote:
>>>
>>>> On Friday, 26 February 2021 11:31:00 PST Andreas Schwab wrote:
>>>> > On Feb 26 2021, Thiago Macieira wrote:
>>>> > > On Friday, 26 February 2021 10:14:42 PST Andreas Schwab wrote:
>>>> > >> On Feb 26 2021, Thiago Macieira via Gcc-patches wrote:
>>>> > >> > - alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
>>>> > >> > + alignas(__alignof__(int)) int _M_a;
>>>> > >>
>>>> > >> Futexes must be aligned to 4 bytes.
>>>> > >
>>>> > > Agreed, but doesn't this accomplish that?
>>>> >
>>>> > No. It uses whatever alignment the type already has, and is an
>>>> > elaborate no-op.
>>>>
>>>> I thought so too when I read the original line. But I expected it was written
>>>> like that for a reason, especially since the same pattern appears in other
>>>> places.
>>>>
>>>> I can change to "alignas(4)" (which is a GCC extension, I believe). Is that
>>>> the correct solution?
It's not a GCC extensions. You're thinking of alignas(obj) which is a
GCC extension.
>>>IMNSHO make use of the corresponding atomic type. Then there'd
>>>be no need for separate what's-the-right-align-curse games.
>
>That won't work though, because we need direct access to the integer
>object, not to a std::atomic<int> which contains it.
>
>>Or align as the corresponding atomic type (in case using an actual
>>std::atomic<int> is undesirable). OTOH the proposed code isn't
>>any more bogus than the previous ...
>
>The previous code accounts for the fact that ptrdiff_t is a typedef
>for an unspecified type, and that some ABIs allow struct members to have
>weaker alignment than they would have otherwise.
>
>e.g. __alignof__(long long) != alignof(long long) on x86.
>
>Yes, I know ptrdiff_t isn't long long on x86, but since it's a typedef
>for some target-specific type, it's still possible that
>alignof != __alignof__. So alignas(__alignof__(T)) is not necessarily
>a no-op. So not bogus.
>
>For int, there shouldn't be any need to force the alignment. I don't
>think any ABI supported by GCC allows int members to be aligned to
>less than __alignof__(int). Users could break it by compiling with
>#pragma pack, but I have no sympathy for such silliness.
Jakub said on IRC that m68k might have alignof(int) == 2, so we'd need
to increase that alignment to use it as a futex.
N.B. std::atomic and std::atomic_ref don't use alignas(__alignof__(T))
they do this instead:
static_assert(is_trivially_copyable_v<_Tp>);
// 1/2/4/8/16-byte types must be aligned to at least their size.
static constexpr int _S_min_alignment
= (sizeof(_Tp) & (sizeof(_Tp) - 1)) || sizeof(_Tp) > 16
? 0 : sizeof(_Tp);
public:
static constexpr size_t required_alignment
= _S_min_alignment > alignof(_Tp) ? _S_min_alignment : alignof(_Tp);
Using std::atomic_ref<T>::required_alignment would give that value.
For something being used as a futex we should just use alignas(4)
though, since that's what the kernel requires.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-03 14:56 ` Jonathan Wakely
2021-03-03 15:02 ` Andreas Schwab
2021-03-03 15:10 ` Jonathan Wakely
@ 2021-03-03 15:37 ` Hans-Peter Nilsson
2 siblings, 0 replies; 73+ messages in thread
From: Hans-Peter Nilsson @ 2021-03-03 15:37 UTC (permalink / raw)
To: Jonathan Wakely
Cc: Richard Biener, Thiago Macieira, Andreas Schwab, libstdc++, GCC Patches
On Wed, 3 Mar 2021, Jonathan Wakely wrote:
> For int, there shouldn't be any need to force the alignment. I don't
> think any ABI supported by GCC allows int members to be aligned to
> less than __alignof__(int).
(sizeof(int) last)
The CRIS ABI does as in default packed, and ISTR there was some
other gcc port too, but I don't recall whether that (too) had no
(current) Linux port.
brgds, H-P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-03-03 14:34 ` Jonathan Wakely
@ 2021-03-03 16:21 ` Jonathan Wakely
2021-03-03 17:27 ` Thiago Macieira
2021-03-03 17:34 ` Ville Voutilainen
0 siblings, 2 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 16:21 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On 03/03/21 14:34 +0000, Jonathan Wakely wrote:
>On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>>The kernel doesn't care what we store in those 32 bits, only that they are
>>comparable. This commit adds:
>>* pointers and long on 32-bit architectures
>>* unsigned
>>* untyped enums and typed enums on int & unsigned int
>>* float
>>
>>We're not using FUTEX_OP anywhere today. The kernel reserves 4 bits for
>>this field but has only used 6 values so far, so it can be extended to
>>unsigned compares in the future, if needed.
>
>And this one for GCC 11 too.
>
>>libstdc++-v3/include/bits/atomic_wait.h | 7 ++++---
>>1 file changed, 4 insertions(+), 3 deletions(-)
>>
>>diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
>>index 424fccbe4c5..1c6bda2e2b6 100644
>>--- a/libstdc++-v3/include/bits/atomic_wait.h
>>+++ b/libstdc++-v3/include/bits/atomic_wait.h
>>@@ -65,7 +65,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> template<typename _Tp>
>> inline constexpr bool __platform_wait_uses_type
>>#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>>- = is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
>>+ = is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) == sizeof(__platform_wait_t)
Oh, except that is_scalar is surprisingly expensive to instantiate
(its defined in a really expensive way) and since we control all uses
of this constant, I don't think we need it. It's only ever used from
atomic waiting functions which are only defined for scalar types.
>>+ && alignof(_Tp) >= alignof(__platform_wait_t);
>>#else
>> = false;
>>#endif
>>@@ -91,13 +92,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>> template<typename _Tp>
>> void
>>- __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
>>+ __platform_wait(const _Tp* __addr, _Tp __val) noexcept
>> {
>> for(;;)
>> {
>> auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
>> static_cast<int>(__futex_wait_flags::__wait_private),
>>- __val, nullptr);
>>+ static_cast<__platform_wait_t>(__val), nullptr);
>> if (!__e || errno == EAGAIN)
>> break;
>> else if (errno != EINTR)
>>--
>>2.30.1
>>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-03 14:30 ` Jonathan Wakely
@ 2021-03-03 17:07 ` Thiago Macieira
2021-03-03 17:14 ` Jonathan Wakely
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:07 UTC (permalink / raw)
To: Ville Voutilainen, Jonathan Wakely; +Cc: Thomas Rodgers, libstdc++
On Wednesday, 3 March 2021 06:30:20 PST Jonathan Wakely wrote:
> Yes, exactly. They could still do that if some additional option/macro
> was needed to get to those new facilities though.
>
> My concern is that if we add a new "I know what I'm doing" option,
> everybody defines it (because that's what blog posts and youtube
> videos and conference talks tell them to do, because that gives them
> the new shiny things) and then it just becomes a default.
>
> Maybe Qt wouldn't define it, but plenty would.
Well, if you define the "void warranty" option and you break things, you get
to keep both pieces.
Another option is to fiddle with the include path.
$ gcc -v -E -xc++ /dev/null
Using built-in specs.
[...]
#include <...> search starts here:
/usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10
/usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/x86_64-
generic-linux
/usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/backward
/usr/lib64/gcc/x86_64-generic-linux/10/include
/usr/local/include
/usr/lib64/gcc/x86_64-generic-linux/10/include-fixed
/usr/include
That "backward" has always got me thinking. What if we had a command-line
option like -fexperimental-c++ option that added the $CXXROOT/experimental
(complete with its bits/ and $ARCH subdirs)? That way, the regular
atomic_base.h could do:
#if __cplusplus > 201703L && __has_include(<bits/atomic_wait.h>)
And the second conditional would fail unless the option is provided.
The advantage of a compiler option is that you need each DSO to consciously
add something to their build systems and that can't be done by header-only
libraries. That's like MSVC's /std:c++latest, which is documented by MS as
"here there be dragons".
GCC can also generate some extra bit of assembly to mark the final executable
as using experimental API, so distributors can easily tell that unfinished API
was used. I've successfully done this for Qt using ELF versions.
$ rpm -qR kwin5 | grep PRIVATE
libQt5Core.so.5(Qt_5.15.2_PRIVATE_API)(64bit)
libQt5Gui.so.5(Qt_5.15.2_PRIVATE_API)(64bit)
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-03 14:24 ` Jonathan Wakely
@ 2021-03-03 17:12 ` Thiago Macieira
0 siblings, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:12 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++
On Wednesday, 3 March 2021 06:24:45 PST Jonathan Wakely wrote:
> But it's not a new problem for GCC 11, and not specific to the atomic
> wait feature. It's worth further consideration though.
It isn't, except that in this case we know, before release, that ABI will
silently break in ways that may cause heisenbugs and deadlocks. Given that we
know that, I'm asking for a conscious decision on one of three options:
1) make the functionality unavailable (by default or at all)
2) make the ABI break non-silent when it comes
3) accept the silent ABI break
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-03 14:34 ` Jonathan Wakely
@ 2021-03-03 17:14 ` Thiago Macieira
2021-03-03 17:18 ` Jonathan Wakely
0 siblings, 1 reply; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:14 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches
On Wednesday, 3 March 2021 06:34:26 PST Jonathan Wakely wrote:
> On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
> >ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
>
> Yes, we should do this for GCC 11.
Want me to re-submit this one alone, with the "alignas(4)" with a commend
indicating it's what the kernel requires?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: C++2a synchronisation inefficient in GCC 11
2021-03-03 17:07 ` Thiago Macieira
@ 2021-03-03 17:14 ` Jonathan Wakely
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:14 UTC (permalink / raw)
To: Thiago Macieira; +Cc: Ville Voutilainen, Thomas Rodgers, libstdc++
On 03/03/21 09:07 -0800, Thiago Macieira via Libstdc++ wrote:
>On Wednesday, 3 March 2021 06:30:20 PST Jonathan Wakely wrote:
>> Yes, exactly. They could still do that if some additional option/macro
>> was needed to get to those new facilities though.
>>
>> My concern is that if we add a new "I know what I'm doing" option,
>> everybody defines it (because that's what blog posts and youtube
>> videos and conference talks tell them to do, because that gives them
>> the new shiny things) and then it just becomes a default.
>>
>> Maybe Qt wouldn't define it, but plenty would.
>
>Well, if you define the "void warranty" option and you break things, you get
>to keep both pieces.
>
>Another option is to fiddle with the include path.
>
>$ gcc -v -E -xc++ /dev/null
>Using built-in specs.
>[...]
>#include <...> search starts here:
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/x86_64-
>generic-linux
> /usr/lib64/gcc/x86_64-generic-linux/10/../../../../include/c++/10/backward
> /usr/lib64/gcc/x86_64-generic-linux/10/include
> /usr/local/include
> /usr/lib64/gcc/x86_64-generic-linux/10/include-fixed
> /usr/include
>
>That "backward" has always got me thinking. What if we had a command-line
>option like -fexperimental-c++ option that added the $CXXROOT/experimental
We already have an experimental sub-dir for TS headers, so you need to
pick another name. Otherwise <vector> and <experimental/vector> get
mixed up.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int
2021-03-03 17:14 ` Thiago Macieira
@ 2021-03-03 17:18 ` Jonathan Wakely
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:18 UTC (permalink / raw)
To: Thiago Macieira; +Cc: libstdc++, gcc-patches
On 03/03/21 09:14 -0800, Thiago Macieira via Libstdc++ wrote:
>On Wednesday, 3 March 2021 06:34:26 PST Jonathan Wakely wrote:
>> On 26/02/21 07:59 -0800, Thiago Macieira via Libstdc++ wrote:
>> >ints can be used as futex on Linux. ptrdiff_t on 64-bit Linux can't.
>>
>> Yes, we should do this for GCC 11.
>
>Want me to re-submit this one alone, with the "alignas(4)" with a commend
>indicating it's what the kernel requires?
Yes please.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-03-03 16:21 ` Jonathan Wakely
@ 2021-03-03 17:27 ` Thiago Macieira
2021-03-03 17:34 ` Ville Voutilainen
1 sibling, 0 replies; 73+ messages in thread
From: Thiago Macieira @ 2021-03-03 17:27 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches
On Wednesday, 3 March 2021 08:21:51 PST Jonathan Wakely wrote:
> >>- = is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
> >>+ = is_scalar_v<remove_cv_t<_Tp>> && sizeof(_Tp) ==
> >>sizeof(__platform_wait_t)
> Oh, except that is_scalar is surprisingly expensive to instantiate
> (its defined in a really expensive way) and since we control all uses
> of this constant, I don't think we need it. It's only ever used from
> atomic waiting functions which are only defined for scalar types.
Thanks. Will update and re-submit.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-03-03 16:21 ` Jonathan Wakely
2021-03-03 17:27 ` Thiago Macieira
@ 2021-03-03 17:34 ` Ville Voutilainen
2021-03-03 17:41 ` Jonathan Wakely
1 sibling, 1 reply; 73+ messages in thread
From: Ville Voutilainen @ 2021-03-03 17:34 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Thiago Macieira, libstdc++, gcc-patches List
On Wed, 3 Mar 2021 at 19:25, Jonathan Wakely via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> Oh, except that is_scalar is surprisingly expensive to instantiate
> (its defined in a really expensive way) and since we control all uses
I'll be more than happy to write you an __is_scalar for GCC 12. :P
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int
2021-03-03 17:34 ` Ville Voutilainen
@ 2021-03-03 17:41 ` Jonathan Wakely
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Wakely @ 2021-03-03 17:41 UTC (permalink / raw)
To: Ville Voutilainen; +Cc: Thiago Macieira, libstdc++, gcc-patches List
On 03/03/21 19:34 +0200, Ville Voutilainen wrote:
>On Wed, 3 Mar 2021 at 19:25, Jonathan Wakely via Libstdc++
><libstdc++@gcc.gnu.org> wrote:
>> Oh, except that is_scalar is surprisingly expensive to instantiate
>> (its defined in a really expensive way) and since we control all uses
>
>I'll be more than happy to write you an __is_scalar for GCC 12. :P
That would help with https://gcc.gnu.org/PR96710 too. The current
implementation is inefficient and arguably wrong.
^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2021-03-03 17:41 UTC | newest]
Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 22:50 C++2a synchronisation inefficient in GCC 11 Thiago Macieira
2021-02-26 11:19 ` Jonathan Wakely
2021-02-26 17:37 ` Thiago Macieira
2021-02-26 18:29 ` Jonathan Wakely
2021-02-26 19:30 ` Ville Voutilainen
2021-02-26 21:17 ` Jonathan Wakely
2021-02-26 21:18 ` Ville Voutilainen
2021-02-26 21:39 ` Jonathan Wakely
2021-02-26 18:47 ` Ville Voutilainen
2021-02-26 23:53 ` Thiago Macieira
2021-02-26 23:58 ` Ville Voutilainen
2021-02-27 0:11 ` Thiago Macieira
2021-02-27 0:18 ` Ville Voutilainen
2021-02-27 0:36 ` Thiago Macieira
2021-02-27 0:44 ` Ville Voutilainen
2021-02-27 0:53 ` Thiago Macieira
2021-02-27 1:03 ` Ville Voutilainen
2021-03-03 14:30 ` Jonathan Wakely
2021-03-03 17:07 ` Thiago Macieira
2021-03-03 17:14 ` Jonathan Wakely
2021-02-27 0:22 ` Marc Glisse
2021-02-27 0:30 ` Ville Voutilainen
2021-02-27 0:43 ` Thiago Macieira
2021-03-03 14:24 ` Jonathan Wakely
2021-03-03 17:12 ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Thiago Macieira
2021-02-26 15:59 ` [PATCH 2/5] Atomic __platform_wait: accept any 32-bit type, not just int Thiago Macieira
2021-03-03 14:34 ` Jonathan Wakely
2021-03-03 16:21 ` Jonathan Wakely
2021-03-03 17:27 ` Thiago Macieira
2021-03-03 17:34 ` Ville Voutilainen
2021-03-03 17:41 ` Jonathan Wakely
2021-02-26 15:59 ` [PATCH 3/5] std::__atomic_wait: don't use __detail::__waiter with futex Thiago Macieira
2021-02-26 15:59 ` [PATCH 4/5] barrier: use int instead of unsigned char for the phase state Thiago Macieira
2021-02-28 15:05 ` Hans-Peter Nilsson
2021-03-01 16:28 ` Thomas Rodgers
2021-03-01 17:24 ` Thiago Macieira
2021-03-01 17:38 ` Thomas Rodgers
2021-03-01 17:40 ` Thomas Rodgers
2021-03-01 18:06 ` Thiago Macieira
2021-03-01 19:08 ` Thomas Rodgers
2021-03-01 18:12 ` Ville Voutilainen
2021-03-01 19:44 ` Thiago Macieira
2021-03-01 20:35 ` Ville Voutilainen
2021-03-01 21:54 ` Thiago Macieira
2021-03-01 22:04 ` Ville Voutilainen
2021-03-01 22:21 ` Thiago Macieira
2021-03-01 22:31 ` Ville Voutilainen
2021-03-01 22:40 ` Thiago Macieira
2021-02-26 15:59 ` [PATCH 5/5] barrier: optimise by not having the hasher in a loop Thiago Macieira
2021-03-03 14:36 ` Jonathan Wakely
2021-02-26 18:14 ` [PATCH 1/5] std::latch: reduce internal implementation from ptrdiff_t to int Andreas Schwab
2021-02-26 19:08 ` Thiago Macieira
2021-02-26 19:31 ` Andreas Schwab
2021-02-27 0:13 ` Thiago Macieira
2021-02-28 21:31 ` Hans-Peter Nilsson
2021-03-01 8:56 ` Richard Biener
2021-03-03 14:56 ` Jonathan Wakely
2021-03-03 15:02 ` Andreas Schwab
2021-03-03 15:10 ` Jonathan Wakely
2021-03-03 15:37 ` Hans-Peter Nilsson
2021-03-01 16:32 ` Thomas Rodgers
2021-03-03 14:34 ` Jonathan Wakely
2021-03-03 17:14 ` Thiago Macieira
2021-03-03 17:18 ` Jonathan Wakely
2021-02-27 1:13 ` C++2a synchronisation inefficient in GCC 11 Thomas Rodgers
2021-02-27 1:29 ` Thomas Rodgers
2021-02-27 3:01 ` Thomas Rodgers
2021-03-01 17:46 ` Thomas Rodgers
2021-03-01 18:00 ` Thiago Macieira
2021-03-01 18:34 ` Thomas Rodgers
2021-03-01 19:11 ` Thiago Macieira
2021-02-27 2:02 ` Ville Voutilainen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).