Hello. With the current [posix-y] implementation of std::call_once, we are seeing significant issues for Darwin. The issues vary in strength from “hangs or segvs sometimes depending on optimisation and the day of the week” (newer OS, faster h/w) to “segvs or std::terminates every time” older OS slower h/w. Actually, I think that in the presence of exceptions it is not working on Linux either. As a use-case: this problem prevents us from building working LLVM/clang tools on Darwin using GCC as the bootstrap compiler. the test code I am using is: https://godbolt.org/z/Mxcnv9YEx (derived from a cppreference example, IIRC). —— So I was investigating what/why there might be issues and... * ISTM that using the underlying pthread_once implementation is not playing at all well with exceptions (since pthread_once is oblivious to such). * the current implementation cannot support nested cases. —— Here is a possible replacement implementation with some advantages: * does not require TLS, it is all stack-based. * does not use global state (so it can support nested cases). * it actually works reliably in real use-cases (on old and new Darwin, at least). And one immediately noted disadvantage: * that once_flag is significantly larger with this implementation (since it gains a mutex and a condition var). ----- This is a prototype, so I’m looking for advice/comment on: (a) if this can be generalised to be part of libstdc++ (b) what that would take ... (c) ..or if I should figure out how to make this a darwin-specific impl. —— here is the prototype implementation as a non-patch .. (patch attached). mutex: #ifdef _GLIBCXX_HAS_GTHREADS /// Flag type used by std::call_once struct once_flag { /// Constructor constexpr once_flag() noexcept : _M_state_(0), _M_mutx_(__GTHREAD_MUTEX_INIT), _M_condv_(__GTHREAD_COND_INIT) {} void __do_call_once(void (*)(void*), void*); template friend void call_once(once_flag& __once, _Callable&& __f, _Args&&... __args); private: __gthread_mutex_t _M_mutx_; __gthread_cond_t _M_condv_; // call state: 0 = init, 1 = someone is trying, 2 = done. atomic_uint _M_state_; /// Deleted copy constructor once_flag(const once_flag&) = delete; /// Deleted assignment operator once_flag& operator=(const once_flag&) = delete; }; /// Invoke a callable and synchronize with other calls using the same flag template void call_once (once_flag& __flag, _Callable&& __f, _Args&&... __args) { if (__flag._M_state_.load (std::memory_order_acquire) == 2) return; // Closure type that runs the original function with the supplied args. auto __callable = [&] { std::__invoke(std::forward<_Callable>(__f), std::forward<_Args>(__args)...); }; // Trampoline to call the actual fn; we will pass in the closure address. void (*__oc_tramp)(void*) = [] (void *ca) { (*static_cast(ca))(); }; // Attempt to do it and synchronize with any other threads that are also // trying. __flag.__do_call_once (__oc_tramp, std::__addressof(__callable)); } #else // _GLIBCXX_HAS_GTHREADS —— mutex.cc: // This calls the trampoline lambda, passing the address of the closure // repesenting the original function and its arguments. void once_flag::__do_call_once (void (*func)(void*), void *arg) { __gthread_mutex_lock(&_M_mutx_); while (this->_M_state_.load (std::memory_order_relaxed) == 1) __gthread_cond_wait(&_M_condv_, &_M_mutx_); // mutex locked, the most likely outcome is that the once-call completed // on some other thread, so we are done. if (_M_state_.load (std::memory_order_acquire) == 2) { __gthread_mutex_unlock(&_M_mutx_); return; } // mutex locked; if we get here, we expect the state to be 0, this would // correspond to an exception throw by the previous thread that tried to // do the once_call. __glibcxx_assert (_M_state_.load (std::memory_order_acquire) == 0); try { // mutex locked. _M_state_.store (1, std::memory_order_relaxed); __gthread_mutex_unlock (&_M_mutx_); func (arg); // We got here without an exception, so the call is done. // If the underlying implementation is pthreads, then it is possible // to trigger a sequence of events where wake-ups are lost - unless the // mutex associated with the condition var is locked around the relevant // broadcast (or signal). __gthread_mutex_lock(&_M_mutx_); _M_state_.store (2, std::memory_order_release); __gthread_cond_broadcast (&_M_condv_); __gthread_mutex_unlock (&_M_mutx_); } catch (...) { // mutex unlocked. // func raised an exception, let someone else try ... // See above. __gthread_mutex_lock(&_M_mutx_); _M_state_.store (0, std::memory_order_release); __gthread_cond_broadcast (&_M_condv_); __gthread_mutex_unlock (&_M_mutx_); // ... and pass the exeception to our caller. throw; } } ===== the implementation in mutex.cc can sit togethe with the old version so that the symbols for that remain available (or, for versioned libraries, the old code can be deleted). thanks Iain